BB30055: Genes and genomes
Download
Report
Transcript BB30055: Genes and genomes
BB30055: Genes and genomes
Major insights from the HGP
What makes us human?
• SNPS occur at a
mean rate of 1.23%
Nature 437, 50-51
(1 September 2005)
Major insights from the HGP
1) Gene size, content and distribution
2) Proteome content
3) SNP identification
4) Distribution of GC content
5) CpG islands
6) Recombination rates
7) Repeat content
Nature (2001) 15th Feb Vol 409 special issue; pgs 814 & 875-914.
1) Gene size
Gene content….
More genes: Twice as many as drosophila / C.elegans
Uneven gene distribution: Gene-rich and gene-poor
regions
More paralogs: some gene families have extended
the number of paralogs e.g. olfactory gene family
has 1000 genes
More alternative transcripts: Increased RNA splice
variants produced thereby expanding the primary
proteins by 5 fold (e.g. neurexin genes)
Gene distribution
Genes generally dispersed (~1 gene per 100kb)
Class III complex at HLA 6p21.3
Overlapping genes (transcribed from 2 DNA strands) - Rare
Genes- within genes E.g. NF1 gene
HMG3 Fig 9.8
Uneven gene distribution
Gene-rich
E.g. MHC on chromosome 6 has 60 genes
with a GC content of 54%
Gene-poor regions
82 gene deserts identified
? Large or unidentified genes
What is the functional significance of these
variations?
2) Proteome content
proteome more complex than invertebrates
Protein Domains (sections with identifiable
shape/function)
Domain arrangements in humans
largest total number of domains is 130
largest number of domain types per protein is 9
Mostly identical arrangement of domains
A
A
B
B
B
C
C
C
C
C
Protein X
Proteome more complex than invertebrates……
no huge difference in domain number in humans
BUT, frequency of domain sharing very high in human
proteins (structural proteins and proteins involved in
signal transduction and immune function)
However, only 3 cases where a combination of 3 domain
types shared by human & yeast proteins.
e.g carbomyl-phosphate synthase (involved in the first 3
steps of de novo pyrimidine biosynthesis) has 7 domain
types, which occurs once in human and yeast but twice in
drosophila
3) SNPs (single nucleotide polymorphisms)
Sites that result from point
mutations in individual base pairs
More than 1.4million SNPs
identified (~ 1 in every 1.9kb length
on average)
~60,000 SNPs lie within exons and
untranslated regions (85% of exons
lie within 5kb of a SNP)
May or may not affect the ORF
(synonymous or non synonymous)
Most SNPs may be regulatory
Densities vary over regions and chromosomes
e.g. HLA region has a high SNP density,
reflecting maintenance of diverse haplotypes
over many MYears
Nature (2001) 15th Feb Vol 409 special issue; pgs 821-823 & 928
Haplotype( haploid genotype)
• Haplotype is a set of single nucleotide polymorphisms
(SNPs) on a single chromatid that are statistically associated.
• Haplotypes are generally shared between populations but
their frequency can vary
International HapMap Project (www.hapmap.org)
– identifying common
haplotypes in four
populations from
different parts of the
world.
- identifying "tag"
SNPs with unique
haplotype identities
How does one distinguish sequence errors
from polymorphisms?
sequence errors
Each piece of genome sequenced at least 10 times
to reduce error rate (0.01%)
Polymorphisms
Sequence variation between individuals (0.1%)
To be defined as a polymorphism, the altered
sequence must be present in a significant
population
Rate of polymorphisms in diploid human genome is about 1 in
500 bp
Nature (2001) 15th Feb Vol 409 special issue; pgs 821-823 & 928
SNPs and disease
SNPs……and risk of disease
N(291)S
SNPs……and pharmacogenomics
4) Distribution of GC content
Genome wide average of 41%
Huge regional variations exist
E.g.distal 48Mb of chromosome 1p-47%
but chromosome 13 has only 36%
Confirms cytogenetic staining with G-bands
(Giemsa)
dark G-bands – low GC content (37%)
light G-bands – high GC content (45%)
Nature (2001) 15th Feb Vol 409 special issue; pg 876-877
C
5) CpG islands
CpG
Methyl CpG
methylated at C
T
TpG
Deamination
CpG islands show no methylation
Significance of CpG islands
1) Non-methylated CpG islands associated with the
5’ ends of genes
2) Usually overlap the promoter region
3) Aberrant methylation of CpG islands linked to
pathologies like cancer or epigenetic diseases
like Rhett’s syndrome
http://www.sanger.ac.uk/HGP/cgi.shtml
Inheritance of CpG methylation
CpG islands
Greatly under-represented in human genome
– ~28,890 in number (5 times less than
expected)
~ 56% of human genes and 47% of the mouse
genes have CpG islands
Variable density
e.g. Y – 2.9/Mb but
16,17 & 22 have 19-22/Mb
Average is 10.5/Mb
Nature (2001) 15th Feb Vol 409 special issue; pg 877-888
6) Recombination rates
2 main observations
• Recombination rate increases with
decreasing arm length
• Recombination rate suppressed near
the centromeres and increases
towards the distal 20-35Mb
7) Repeat content
a) Age distribution
b) Comparison with other genomes
c) Variation in distribution of repeats
d) Distribution by GC content
e) Y chromosome
Nature (2001) 409: pp 881-891
a) Age distribution
overall decline in interspersed repeat activity in
hominid lineage in the past 35-40MYr
compared to mouse genome, which shows a
younger and more dynamic genome
Repeat content…….
a) Age distribution
Most interspersed repeats predate eutherian
radiation (confirms the slow rate of clearance
of nonfunctional sequence from vertebrate
genomes)
LINEs and SINEs have extremely long lives
2 major peaks of transposon activity
No DNA transposition in the past 50MYr
LTR retroposons teetering on the brink of
extinction
b) Comparison with other genomes
Higher density of
transposable
elements in
euchromatic portion
of genome
Higher abundance
of ancient
transposons
60% of IR made up
of LINE1 and Alu
repeats
whereas DNA
transposons
represent only 6%
c) Variation in distribution of repeats
Some regions show either
High repeat density
e.g. chromosome Xp11 – a 525kb region shows
89% repeat density
Low repeat density
e.g. HOX homeobox gene cluster (<2% repeats)
(indicative of regulatory elements which have low
tolerance for insertions)
d) Distribution by GC content
High GC – gene rich ; High AT – gene poor
LINEs abundant in AT-rich regions
SINEs lower in AT-rich regions
Alu repeats in particular retained in actively transcribed
GC rich regions E.g. chromosme 19 has 5% Alus compared
to Y chromosome
e) The Y chromosome !
Unusually young genome (high tolerance to
gaining insertions)
Mutation rate is 2.1X higher in male germline
• Working draft published – Feb 2001
• Finished sequence – April 2003
• Annotation of genes going on
(refer: International Human Genome Sequencing
Consortium. Finishing the euchromatic
sequence of the human genome. Nature 21
October 2004 (doi: 10.1038/nature03001)
References
Chapter 9 pp 265-268
HMG 3 by Strachan and Read
Chapter 10: pp 339-348
Genetics from genes to genomes by
Hartwell et al (2/e)
Nature (2001) 409: pp 879-891
Nature (2005) for Chimp genome
Epigenetic disease – Rett Syndrome
Characterised by neurodevelopmental problems after birth
mutations in a gene on the X chromosome, MECP2 (methyl
CpG-binding protein 2), whose protein normally binds to
methylated CpG and represses gene expression
RS symptoms associated with the failure of mutated MECP2
to regulate transcription of a specific gene, DLX5, one allele
of which is normally imprinted. Without the MeCP2 protein,
production of the Dlx5 protein is increased, which influence
production of the neurotransmitter GABA in the brain
DLX5
DLX5