What have we learned from Unicellular Genomes?

Download Report

Transcript What have we learned from Unicellular Genomes?

What have we learned from
Unicellular Genomes?
Propionibacterium acnes
• Responsible for acne, its genome
sequenced in 2004.
• It lives on human skin in sebaceous
follicles; feeds on sebum and this
stimulates immune response of
inflammation.
• Can we understand pimples?
Anatomy of acne
Propionibacterium acnes genome
• Sequenced by three different groups.
– 32 190 sequencing reactions
– 8.7-fold coverage of 2 560 265 bp genome
– Error rate of 0.0001
– Genome contains a single circular
chromosome and no additional plasmids.
– Annotation of 2333 putative genes, allowed
for construction of the metabolism.
Propionibacterium acnes genome
• 12% encoded RNA products (rRNA and
tRNA).
• 1578 (68%) is orthologous with other
organisms and 20% does not match with
anything.
GC skewing
• A non-uniform distribution of guanine and
cytosine bases on the two strands of DNA.
– Origin of replication have the lowest GC skew
(even distribution)
– Terminus of replication have higher GC
skewing.
Horizontal Transfer
• Genes appeared in genome through an
unknown mechanism.
• To find alien genes, scan the genome with
a sliding window for segments that have
an abnormal GC content (either higher or
lower than the species average) and
evaluate the codon bias.
– Which codon is used more often than other
codons for a particular amino acid.
Transcriptional Phase Variation
• Variation in the Gs is used to produce
transcriptional variation.
• Initiation of transcription depends on the
number of consecutive guanines on a
particular strand at a critical location
upstream of the coding region.
• Regions of replicating bases are difficult to
accurately replicate which will affect the
transcriptional efficiency.
Which genes cause pimples?
• Metabolic reconstruction:
• Can grow anaerobically and aerobically.
• Has many enzymes to degrade lipids, esters and
amino acids.
• P. acnes digestive enzymes have LPXTG motif
that targets proteins to the extracellular wall;
these enzymes chew away on your cells.
– cell-wall sorting signal LPXTG responsible for
covalently anchoring proteins to the cell-wall
peptidoglycan
– LPxTG, the target for cleavage and covalent coupling
to the peptidoglycan by enzymes called sortases
Which genes cause pimples?
• Cells exterior is decorated with
hyaluronate lyase that destroys the
extracellular matrix binding your skin cells
together and thus facilitates further tissue
invasion and digestion.
LPxTG Database: Sortase
substrates
http://bamics3.cmbi.kun.nl/cgi-bin/jos/sortase_substrates/index.py
Stimulation of immune response
• Genome encodes five CAMP (Christie,
Atkins, Munch-Peterson) factors. CAMP
factors are secreted proteins that bind to
antibodies (IgG and IgM) and can form
pores in eukaryotic cell membranes.
• Lysis of our cells trigger an immune
response.
CAMP factors
• Proteins from BACTERIA and FUNGI that
are soluble enough to be secreted to
target ERYTHROCYTES and insert into
the membrane to form beta-barrel pores.
Biosynthesis may be regulated by
HEMOLYSIN FACTORS
Quorum Sensing
• Many bacteria have evolved the ability to
condition culture medium by secreting lowmolecular-weight signaling pheromones in
association with growth phase to control
expression of specific genes, a process termed
quorum sensing
–
–
–
–
Bioluminescence
antibiotic biosynthesis
Pathogenicity
plasmid conjugal transfer
Quorum Sensing
• LuxS produces the precursor of
autoinducer-2 (AI-2), 4,5,-dihydroxy-2,3pentanedione (DPD), whilst converting Sribosylhomocysteine to homocysteine.
Are all bacteria Living in Us Bad for
Us?
• An average adult body is composed of
about 10 trillion human cells.
• Every milliliter of your large intestine’s
content is estimated to contain 10 billion
microbes and our intestines contain about
1 L..
• There are about 500 to 1000 different
species living in an adult’s intestines.
Bacteroides thetaiotaomicron
•
•
•
•
31 million bases
Assembly of 867 contigs with many gaps.
Finished assembly by PCR
67 938 sequencing runs into a single 6 260 361
bp circular contig.
• Annotated 4779 predicted ORFs with 58%
orthologs of known function, 18% orthologs of
proteins with no known function and 24% with no
recognizable sequence similarity.
COGs
• Clusters of orthologous group are
functional categories of genes.
• They are phylogenetic classiciation of
proteins encoded in complete genomes.
– Transcription
– Energy production, etc.
http://www.ncbi.nlm.nih.gov/COG/
Eukaryotic Clusters
ADH
CDH
Bacteroides thetaiotamicron
• It can metabolize sugars.
• 170 genes for polysaccharide metabolism;
paralogs of 23 genes.
• E. coli has only 8 of them.
• It can also import sugars into its own
cytoplasm.
– Has two genes SusC and SusD represented
by 163 paralogs.
Transposable Elements
• 63 TEs contain ORFs (open reading
frames) that help spread tetracycline and
erythromycin resistance between
individual cells and between species in the
microbiota of the gut.
Coding Capacity
• Gene density for B. thetaiotaomicron is
89%.
• Average size of a gene is 1170 bp-largest
among bacteria.
– M. genitalium 1100 bp
– H. pylori 1000 bp
– E. coli 950 bp
Can Microbial Genomes Become
Dependent upon Human Genes?
• Second smallest bacterial genome of a selfreplicating species (589 070 bp).
• A team in TIGR (The Institute for Genomic
Research)
– 5 people, 8 weeks assembled 8472 high-quality
sequencing reactions.
– Overall GC content is 32%
– GC skew reveals the origin of replication as DnaA and
DnaN genes.
• Right to the OR transcribed from plus strand
• Left to the OR transcribed from minus strand
• tRNA and rRNA genes have higher GC content, 52 and 44%.
Genome Map
• 470 ORFs; 88% coding capacity; average gene
is 1040bp.
• Retained genes for energy metabolism, fatty
acid and PL metabolism, replication,
transcription, and protein transport.
• Lost DNA when no need for it.
–
–
–
–
aa synthesis
Cofactors
Cell envelope
Regulatory factors
Synteny
• When a series of genes are conserved in
order and orientation between two or more
species, the genes are described as
syntenic.
– M. genitalium and H. influenzae has similar
gene orders with respect to two clusters of
ribosomal proteins.
Minimum Number of Genes
• Synthetic biology: to synthesize de novo
(from scratch) a functioning genome with
as few genes as possible.
– Bacillus subtilis – 190 genes
– M. genitalium – 260 genes
Bacteria vs. Viruses
• Smallest genome is an Archean N.
equitans (490 kb)
• HIV-9200 nt
• SARS-29797 nt
• Lambda-48502 nt
• Acanthamoeba polyphaga-Mimivirus:
infects amoeba
– dsDNA-1 181 404 bp with 1262 ORFs linear
chromosome
Mimivirus Genome
• 28%GC rich
• 90% coding capacity
• Uses biased codons-lacking G or C; uses the
least common codon in amoeba the least.
• It has proteins used for translation,
posttranslation modification, DNA repair-sounds
more like a eukaryote.
– Encodes topoisomerases
– Has a self-splicing intron
Is Mimivirus Alive?
• Mimivirus is most closely associated with
Eurkaryota
• Infectious after 1 year of incubation at 4 C.
• Survived 48 hours of desiccation and 1%
survived 55 C.
• Mimivirus can participate in all major steps
of translation.
– A life form
– Highly modified virus?
Malaria
• 3 billion people in the world in tropical and
subtropical climates affected.
• Malaria causing ekaryotic parasite genus
Plasmodium
• 2.7 million people die each year.
Plasmodium
• Plasmodium falciparum is the most lethal form
transmitted to humans by Anopheles mosquito.
– Infected mosquito bites, parasite leaves salivary
glands move to liver and infects hepatocytes. They
mature in hepatocytes and hatch out into RBCs.
– A new parasite emerges from RBCs by bursting it,
release progeny and metabolic waste causing fever
followed by chills.
– A few cells differentiate into gametes move through
blood can be ingested by new mosquito and gatmetes
form zygotes and meiosis and to salivary glands.
Infection of RBCs
• RBC 6 micron; plasmodium 1.2 micron
• Plasmodium enters RBC by evading
immune system by sticking to RBCs.
• Apicoblast organnelle that is made up of a
remnant internalized alga retaining its
small genome needed for plasmodium
survival.
Plasmodium Genome
• Three genomes
– Nuclear: chromosomes separated through
pulse-field gel electrophoresis before random
fragmentation and cloning; 22 853 764 with
5268 ORFs; 19.4% GC; 52.6% coding
capacity; average gene length 2283 bp.
– Mitochondrial: 5967 bp encodes 3 proteins
– Apicoplastic: 29 422 bp encodes 30 proteins
Plasmodium is a eukaryote
• 54% of its genes contains one or more
intron with an average 13.5%GC (exons
have higher GC%).
• 60% of ORFs have no known function
rRNA genes
• In many species rRNA genes appear in
linear clusters
• In Plasmodium, rRNA gene distribution
var, their expression is host specific; some
are expressed in human; the other set is
active in mosquito
Centromeres and telomeres
• Centromeres are AT rich (97%) and
contain short tandem repeats.
• Telomeres have repeated sequences that
vary in length; some genes located nearby
telomeres are replicated many times
therefore genes have paralogs.
– Highly variable gene families, var, rif and
stevor (polymorphic) and may add variation to
the extracellular surface of the Plasmodium.
Hydropathy plot
http://expasy.org/cgi-bin/protscale.pl
Hydropathy plot
Plasmodium
• 31% of the encoded polypeptides are
predicted to be integral proteins.
– 1% cell-to-cell adhesion
– 4% evasion of immune system
Apicoblast
• Derived chloroplast
• Synthesizes fatty acids, isoprenoids, and
heme groups
• 10% of all proteins help apicoblast DNA
replication and repair, transcription,
translation, posttranslational glycosylation
etc.
Food
• Plasmodium feeds on hemoglobin, digests
it in food vacuole;
• It has no genes for aa synthesis; no
trehalose (storage sugar in yeast) storage
nor glycogen ‘lives at the moment’
Is there a model eukaryote
genome?
• yeast
Yeast Genome
• Published in October 1996
• 12 068 kb genome of 16 chromosomes
• 6272 ORFs
– 38.3% GC with a coding capacity of 70.3%
– GC content for eukaryotes generally higher for the
coding portions.
– Coding capacity is much lower than bacteria
• Yeast has a gene every 2 kb
• Worm has a gene every 6 kb
• Humans have a gene every 30 kb
Genome structure
• S. cereviciae experienced genome
duplication events.
• Chromosomes V and X, IV and II, and III
and XIV are have paralogous regions.
– Duplicated region on chr III contains four
genes; one of which is citrate synthase (cit2).
• Cit2(chrIII) targets peroxisome and cit1(chrXIV)
targets the mitochondrion.