human accelerated region - School of Life Sciences

Download Report

Transcript human accelerated region - School of Life Sciences

IB404 - 20 - Other primates – April 4
1. Two competing groups, one from the US Department of Energy (recall the original home of the
human genome project), and the other led by Svante Pääbo at the Max Planck Institute for
Evolutionary Anthropology in Leipzig, Germany, made several attempts to sequence nuclear
genomic DNA from Neanderthal bones using 454 sequencing(they had already done
mitochondrial DNA, which is relatively abundant). Homo neanderthalensis has only recently
been recognized as a separate species, living from ~500,000 to ~30,000 years ago in Europe.
Pääbo’s group finally succeeded in sequencing a draft genome from three bones aged 38-49,000
years old, published in 2010. This was a technically challenging task, with many improvements
in DNA extractions, avoiding contamination with modern human DNA, etc. required, and it is
still a rather rough draft genome sequence using Illumina. It required the human reference to
extract the few neanderthal sequences from the mess of contaminating bacterial DNA.
2. Other groups has already taken a different tack and specifically
amplified and sequenced particular genes, for example, the
melanocortin-1 receptor exons from two separate fossils. This
receptor partly determines skin and hair color in us, with various
mutations that reduce function leading to blonde or red hair and fair
skin. Amazingly, both their and Pääbo’s neanderthal sequences shared
a novel mutation predicted to reduce receptor function, leading to the
inference that they and we independently evolved this adaptation to
cold northern climates, presumably to increase vitamin D synthesis.
Pääbo’s group found two additional genes implicated in skin
coloration with significant differences in neanderthals.
3. The Pääbo group’s analysis of the whole genome sequence is complicated by two things. First,
there are many errors in these draft Neanderthal sequences caused by conversion of C to T
nucleotides. They therefore restrict much of their analysis to rare changes, called transversions,
were purines are exchanged for pyrimidines and vice versa, e.g. A to T. Even then, the
differences between the Neanderthals, and between them and the available individual human
genome sequences, are relatively low, marginally higher than between the humans. Nevertheless,
they were able to discern that Neanderthals share a small portion, variously estimated at around
5%, of their genome with Europeans, to the exclusion of Africans. This implies that after the
European human lineage left Africa, they interbred to some extent with Neanderthals living in
Europe. More recently Pääbo’s group sequenced a partial genome from a single finger bone
found in a cave in Siberia, known as Denisova, and it appears these Denisovans shared a similar
portion of their genome specifically with modern-day Asians, implying interbreeding in Asia too.
4. Chimpanzees (Pan troglodytes, or Pan paniscus for bonobo or pygmy chimp) are our closest
living relatives, and we last shared a common ancestor ~7 myr ago. When this estimate was first
derived from molecular comparisons in Alan Wilson’s lab at Berkeley about twenty five years
ago (figure below based on hemoglobin amino acid sequence divergence), it was controversial
with paleontologists, who thought the split was much older, but it is now based on many gene
comparisons and is pretty confident. Gorillas are our next closest relative, then orangutans,
among the greater apes; followed by the lesser apes, represented by gibbons. In case you don’t
know, next are Old World monkeys, then New World monkeys, then tarsiers, then prosimians.
5. Simple DNA sequence comparisons show that for matching
sequences we differ by around 1.2%. But in addition there are
many indel differences, ranging from microindels of 1-10 bp to
differential transposon insertions up to 10 kbp, and some larger
indels. These are normally counted by molecular evolutionists as
single events, no matter how long they are, and since they occur
about 1/10 as often as simple base changes, they are a relatively
small contributor to the overall difference. But if you count the
actual numbers of bases involved in indels, which average 36 bp
in a large sample human/chimp comparison, then the difference
goes up to 5%.
6. There are also large-scale chromosomal differences,
although whether they are of any importance in terms of
explaining any of the phenotypic differences is unclear. Given
the large number of chromosomal rearrangements between
mouse and human (±300), even if most occurred in the mouse
lineage, in 6 Myr there should be several between chimp and
human, and indeed chromosomes 1, 4, 5, 9,12, 15, 16, 17 and 18
have major inversions, and small inversions have probably been
missed. In addition, our chromosome 2 is clearly a head-to-head
fusion of two acrocentric chromosomes in the other great apes
(it even retains pieces of telomeric sequence associated with the
centromere, remnants of the ape telomeres at the fusion point).
Chromosome 2
7. Celera again got a jump on the public chimpanzee project (Venter seemed unable to resist
tweaking the collective NIH-funded public projects, for example by also publishing in Science a
1.5X coverage sequence of his own poodle before the public project finished a bulldog
sequence.) They decided to focus on the coding regions of all annotated human genes by
amplifying each exon (using primers designed to the flanking intron sequences, which work
most of the time in chimp because of the low DNA sequence divergence) and then sequencing
each directly. Most human/chimp exons are short enough to do this. They sequenced about
200,000 exons, and then sorted out those genes for which there was clearly human/chimp/mouse
orthology (shown by reciprocal best matches and microsynteny) yielding a conservative dataset
of some 7645 genes.
8. Their analysis focused on models of evolution of non-synonymous and synonymous sites in
the human and chimp lineages since the split, using mouse as an outgroup. This is somewhat
more sophisticated than simple Ka/Ks comparisons, but in principle is similar. The basic idea is
to look for genes/proteins whose evolution in either lineage since the split appears to have been
accelerated. Roughly 1,500 genes showed such acceleration in either the human or the chimp
lineage, however most of the truly convincing examples are accelerated in the human lineage.
These fall into several functional classes.
9. The largest group are odorant receptors, perhaps suggesting important changes in how
chemical signals are perceived, however it is well known that many of the odorant receptor genes
in our genome have become pseudogenized during primate evolution, and this may simply
represent an acceleration of this process in our lineage. They say that most of these seem to be
intact genes, however they could still be pseudogenes if crucial amino acid changes have
inactivated them.
10. Another set of genes is involved in amino acid catabolism. Here their interpretation is that
some of these genes/proteins might be important in metabolism of muscle proteins derived from a
diet richer in meat than chimpanzees, and especially gorillas, eat.
11. They list several other genes implicated in neurogenesis, skeletal development, etc,
including remarkably several homeotic genes which are normally involved in major
developmental decisions of timing and positions of development of body regions and hence
might be involved in the overall morphological differences between chimps and humans.
12. Another set of human-accelerated genes are involved in speech and hearing. One amongst
five they identify involved in hearing is the alpha-tectorin gene, which is involved in making the
tectorial membrane of the inner ear. Single-amino acid changes in humans cause deafness, and a
mouse knockout is deaf. So they suggest that more in depth studies of differences in human and
chimp hearing are warranted. Others had already identified FOXP2 as interesting (next slide).
13. This Celera analysis was an attempt to skim the cream from the milk. The public project
concerned the entire genome, which reveals all the other differences in promoters and other
regulatory regions, as well as all the junk and transposon differences. The hard part is sorting out
which differences really matter, and this is where Celera’s approach of focusing on the coding
regions worked well because only here can you compare synonymous and non-synonymous
changes and hence get a direct indication of the action of selection accelerating non-synonymous
changes. If the simple DNA sequence divergence of alignable regions is just 1% that means there
are 30 million base changes, so it will be hard to find those that matter, let alone figure out all
the indel differences. In addition, not all changes involve amino acid changes as we will see.
14. The most famous single gene showing accelerated amino acid changes in humans is FOXP2,
a gene first cloned as the locus involved in a inherited condition involving severe deficits in
articulation and grammar. It encodes a transcription factor that is widely expressed in the brain
throughout life. The amino acid sequence has barely changed in mammalian evolution, yet as
shown by Pääbo’s and another lab, two amino acids have changed in the human protein (bold).
15. A phylogenetic analysis of the synonymous and non-synonymous changes shows that this is a
statistically significant acceleration of non-synonymous changes in the human lineage, shown
in these two trees from the two groups, with slightly different numbers for non/synonymous
changes on each branch, indicating selection.
More detailed analysis of single
nucleotide polymorphisms around the
gene suggest that these changes were
relatively recent, perhaps as young as
200,000 years ago as modern Homo
sapiens was evolving. It will be
interesting to see further work into
exactly what the FOXP2 gene is doing.
Presumably it is one of many
genes/proteins involved in the evolution
of human speech.
16. Another example of an interesting change is that a
particular myosin, known as number 16 or MYH16, is
a pseudogene in humans while functional in chimps
and gorillas and other primates, due to a frameshifting
microdeletion of two base pairs in exon 18 of 42
(below), thus producing a truncated protein (* is stop
codon). This particular myosin is exclusively
expressed in the large masticatory muscles and appears
to be a major reason that ours (h and i) are highly
reduced in size compared with apes like gorillas (e and
f), and indeed Australopithecus species. Loss of this
myosin is correlated with great reductions in the sizes
of the individual muscle fibers as well as the overall
muscle size as shown in these figures.
17. In an interesting analysis,
these authors also infer that the
pseudogenization mutation
occurred roughly 2.4 Myr ago
and propose that it might have
allowed the evolution of the far
larger cranium of Homo
sapiens. They do this by
comparing the Dn/Ds (similar
to the Ka/Ks) ratios in the
various primate lineages to that
in the human lineage.
Assuming that a pseudogene
would have a ratio of 1, they
estimate that the pseudogene
formed roughly 2.4 Myr ago
because that would allow
enough time evolving as a
pseudogene to generate the
observed ratio of 0.53 on the
human lineage. While this is
obviously a little sketchy, it is
another interesting use of the
pattern of nucleotide changes.
18. Many interesting observations have been made from the subsequently
published public chimpanzee genome project (2005). Perhaps the most
remarkable used the following logic. It turns out that in comparisons of
mammalian genomes, and indeed back to fish, there are a few hundred
regions of the genome that are remarkably conserved, something like
greater than 95% DNA identity over more than 200 bp. These ultraconserved regions are generally not parts of exons, because even for the
most conserved identical proteins, third codon changes would reduce the
identity below 95%. Indeed we have little idea what most of them are,
although some are clearly non-coding RNAs. Katherine Pollard, a postdoc
working in David Haussler’s laboratory at the University of California at
Santa Cruz (his lab generates the UCSC Genome Browser), wrote a
computer program to identify regions of the mammalian genome that have
been conserved for a long time, like these ultra-conserved regions, but
which have suddenly sped up in the human lineage since the split from
chimpanzees. She found about 200 such regions, and the top one, called
HAR1 for human accelerated region one, is just 118 bp long, and is
essentially identical across mammals, but has 18 changes in humans.
It turns out that HAR1 is indeed a non-coding RNA, and it is expressed
in the brain, but also in testes. It seems that it is essential somehow for
proper formation of the folded structures of the cerebrum, but precisely
how these 18 base changes, which clearly change the 2D and 3D shape
of this non-coding RNA (image), might have led to HAR1 contributing
to our larger brains remains unclear.
18. There are genome projects underway on representatives of all
the other major lineages of primates, e.g. gorilla just published,
including the rhesus macaque Macaca mulatta from Asia, which is
a major biomedical experimental organism. The divergence from
humans and chimps is around 25 Myr. You can see the
consequences of time in terms of the numbers of chromosomal
rearrangements on these lineages. A detailed view below shows that,
much like in flies, the vast majority of these are intrachromosomal,
with only a few translocations revealed by color combinations. The
authors identify a number of genes indicating positive selection in
these three primates, but these analyses really require more species.