MRC Slide Format

Download Report

Transcript MRC Slide Format

THE HUMAN GENOME SERIES
MAMMALIAN GENES
I. Conservation and Slow Evolution (today)
II. Functional Innovation and Rapid Change (Feb 10)
Your genome!
Feb 10
FAST
SLOW
Feb 3
Questions
•
•
•
•
•
•
Are we ‘just’ E. coli, except more so?
Where do new genes come from?
Do all genes evolve at the same rate?
Do all tissues & organs evolve at the same rate?
Where do we fit in the tree of life?
What specifies the differences between us and
rodents, or us and chimps?
• What specifies the elevated complexity of us versus
other animals?
• Can we understand sequence variation among
humans?
• How can gene function contribute to behaviour?
Theodosius Dobzhansky (1900-1975)
“Nothing in Biology makes sense
except in the light of Evolution”
• Are we ‘just’ E. coli, except more so?
"Tout ce qui est
vrai pour le
Colibacille est
vrai pour l'éléphant"
Jacque Monod
(1972)
1965 Nobel laureate
"Tout ce qui est
vrai pour le
Colibacille est
vrai pour l'éléphant ?"
Genes
5.4k
~ 30k
Mode of Protein Evolution
• De novo creation
• Gene fusion / fission
• Gene duplication
• Rapid sequence
change
• Pseudogenisation

Genomes and Timelines wrt
Archaea
1000 Mya
Invertebrates
100 Mya
3000 Mya
1000 Mya
Rodents
75 Mya
Chimpanzee
5 Mya
10 Mya
1 Mya
THE ORIGIN AND EVOLUTION OF MODEL ORGANISMS
Hedges, SB Nature Reviews Genetics 3, 838 -849 (2002)
Sequencing
Assembly
DNA Repeats
Gene Prediction
Genome Comparison
Gene Comparison
Gene Number
•
•
•
•
•
•
•
•
•
Walter Gilbert [1980s] 100k
Antequera & Bird [1993] 70-80k
John Quackenbush et al. (TIGR) [2000] 120k
Ewing & Green [2000] 30k
Tetraodon analysis [2001] 35k
Human Genome Project (public) [2001] ~ 31k
Human Genome Project (Celera) [2001] 24-40k
Mouse Genome Project (public) [2002] 25k -30k
Lee Rowen [2003] 25,947
Complexity & Gene Number?
35000
60000
30000
50000
Gene
GeneCount
Count
25000
40000
20000
30000
Series1
Series1
15000
20000
10000
10000
5000
0
0
Human Cress
Fly
Worm
S.
Maize
Human
Cress
Fly
Worm
pombeS. pombe
“Revealed: the secret of human
behaviour. Environment, not genes,
key to our acts”
“We simply do not have enough genes for this idea of biological
determinism to be right. The wonderful diversity of the human
species is not hard-wired in our genetic code. Our environments
are critical.” J Craig Venter February 10, 2001
Complexity?
• Is ‘culture’ proportional to population size?
• Is the complexity of the WWW proportional
to its size?
• Combinatorial argument
• Genetic interactions; alternative splicing;
non-genic regulation; post-transcriptional
& post-translational modifications
Complexity of Protein Sequences
2000
1800
Architecture numbers in
4 eukaryotic proteomes
1600
1400
1200
TM
1000
extra
intra
800
600
Data generated
using SMART
400
200
0
Human
Fly
Worm
Yeast
Function
Orthologues and
Paralogues
Cenancestor
SP1
SP2
DP2
A1
B1
C1
C1 and C2 are paralogues
A1 and B1 and (C1 and C2) are orthologues
C2
Only 1,195 human genes
were found that had single
orthologues in worm and fly.
Approx 95% of human genes
do not have obvious
orthologues in fly and worm
Data from Rich Copley and Peer Bork
Extracellular signalling proteins are
among the most different between animals
Drosophila
Human
220
119
C. elegans
12
Antifreeze protein
type III from
Antarctic eel pout
(Lycodichthys dearborni)
Few sequencebased findings.
For example …
[359 residues]
Are we polyploid?
Human(x):Fly(1):Worm(1)
1400
1200
Frequency
1000
800
600
400
200
0
1
2
3
4
5
6
7
8
9
10
No. of human paralogues
Richard Copley
11+
Segmental Duplication in the Human
Genome
Bailey et al. Science. 2002 297: 1003-7. Am J Hum Genet. 2003 73: 823-34
Horizontal
Gene
Transfer?
• The claim: “113 of these genes are widespread
among bacteria, but, among eukaryotes, appear
to be present only in vertebrates. These genes
[may have] entered the vertebrate (or
prevertebrate) lineage by horizontal transfer
from bacteria.”
Stanhope et al. Nature 2001 Jun 21; 411(6840): 940-4.
“Phylogenetic analyses do not support horizontal gene transfers
from bacteria to vertebrates.”
The coral Acropora millepora
shares a surprisingly large
number of genes with vertebrates.
Curr Biol. 2003 Dec 16; 13(24): 2190-5.
Gene loss is a powerful force in
shaping gene repertoire.
"Tout ce qui est vrai pour le Colibacille est vrai pour l'éléphant“ ?
‘New Domains’
23 of 94 InterPro families:
Defense and Immunity
e.g. IL, interferons, defensins
17 of 94 InterPro families:
Peripheral nervous system
e.g. Leptin, prion, ependymin
4 of 94 InterPro families:
Bone and cartilage
GLA, LINK, Calcitonin, osteopontin
3 of 94 InterPro families:
Lactation
Caseins (a, b, k), somatotropin
2 of 94 InterPro families:
Vascular homeostasis
Natriuretic peptide, endothelin
5 of 94 InterPro families:
Dietary homeostasis
Glucagon, bombesin, colipase, gastrin, IlGF-BP
18 of 94 InterPro families:
Other plasma factors
Uteroglobin, FN2, RNase A, GM-CSF etc.
Pseudogenes
• Two types: processed and non-processed
• 70% processed vs 30% non-processed
• ~ 20,000
Torrents et al. Genome Res. 2003 13: 2559-67.
SNPs
• Human single nucleotide polymorphisms (SNPs)
represent the most frequent type of human population
DNA variation.
• They occur with an average density of 1/1000
nucleotides of a genotype
• Non-synonymous coding SNPs (nsSNPs) comprise a
group of SNPs that are believed to have the highest
impact on phenotype.
• Ditto for SNPs in regulatory regions.
Synonymous change:
Non-synonymous change:
TTA (Leu) → TTG (Leu)
TTA (Leu) → TTT (Phe)
What’s the difference between a mutation
and a polymorphism? Frequency!
A frequency value of 1% of the
polymorphic allele is usually taken as a
threshold between mutation and
polymorphism.
An example of a polymorphic variant which disrupts a critical disulphide bond.
Although this variant (260 Cys→Tyr) in HLA-H protein is strongly associated
with hereditary haemochromatosis, its frequency is as high as 6% in
Northern Europeans with up to 14% in Ireland.
from Sunyaev et al. HMG 2001, Vol. 10, No. 6 591-597
Questions
•
•
•
•
•
•
Are we ‘just’ E. coli, except more so? NO.
Where do new genes come from?
Do all genes evolve at the same rate?
Do all tissues & organs evolve at the same rate?
Where do we fit in the tree of life?
What specifies the differences between us and
rodents, or us and chimps?
• What specifies the elevated complexity of us versus
other animals?
• Can we understand sequence variation among
humans?
• How can gene function contribute to behaviour?
After the break …
Comparative Genomics:
Humans vs Rodents
Human and mouse c-kit mutations show similar phenotypes.
The utility of mouse as a biomedical model for human disease is enhanced when
mutations in orthologous genes give similar phenotypes in both organisms.
In a visually striking example of this, the same pattern of hypopigmentation is seen
in (a) a patient with the piebald trait and (b) a mouse with dominant spotting, both
resulting from heterozygous mutations of the c-kit proto-oncogene.
Rodents as models for human disease
• All but a handful of human genes have
orthologous counterparts in the mouse
and rat genomes.
• In general, disease genes are not under
different selective constraints relative to all
other genes.
• Rodents are good model
organisms for human disease
Mouse equivalents of human
disease variants
Hs normal:
MAETLFWTPLLVVLLAGLGDTEAQQTTLHPLVGRVFVHTLDHETFLSLPEHVAVPPAVHI
Hs variant:
MAETLFWTPLLVVLLAGLGDTEAQQTTLHLLVGRVFVHTLDHETFLSLPEHVAVPPAVHI
Mm normal:
MAAAVTWIPLLAGLLAGLRDTKAQQTTLHLLVGRVFVHPLEHATFLRLPEHVAVPPTVRL
Equivalent disease variants?
– 23 human disease-associated sequence
variants whose variant amino acids are
normal in the mouse. Including:
•
•
•
•
Breast Cancer (BRCA1 and BRCA2)
Cystic Fibrosis (CFTR)
Type 2D LGMD (SGCA)
Becker Muscular Dystrophy (DMD)
– These variants are unlikely to be of
value in understanding human disease.
Mouse vs Human
• Do all genes evolve at the same rate?
• Do all tissues & organs evolve at the
same rate?
• Where do we fit in the tree of life?
• What specifies the differences between
us and rodents?
More organisms …
more comparisons …
~ 1000 more genes identified…
Guigó, R. et al. PNAS (2003) 100, 1140-1145
Sequence conservation
Figure 25. Sequence conservation between mouse and human genes
Mouse genome paper Nature 420, 520-562
Slow Evolution
The human spermidine synthase gene (SRM) and its mouse orthologue (Srm).
The fifth exon in the mouse gene (green) is interrupted by an intron in the
human orthologue.
Orthologues and
Paralogues
Cenancestor
SP1
SP2
DP2
A1
B1
C1
C1 and C2 are paralogues
A1 and B1 and (C1 and C2) are orthologues
C2
Human and mouse
“local synteny”
“Syntenic” regions contain orthologues!
Human and mouse chromosomes:
global orthology
How do we link genomes & genes
to evolution?
• Do all genes evolve at the same rate?
• Do all tissues & organs evolve at the
same rate?
• Where do we fit in the tree of life?
• What specifies the differences between
us and rodents?
Percentage of sequences per interval
Domain-regions are more conserved
30%
25%
Full Length proteins
Domain-free regions
Domain-containing regions
20%
15%
10%
5%
0%
0%
20%
40%
60%
Percentage Identity
80%
100%
Mouse-Human Orthologues % Identity
•
•
•
•
•
sites not in domains:
cSNP sites:
all sites:
sites in domains:
disease sites:
64.4%
67.1%
70.1%
88.9%
90.3%
Little selection at cSNP sites
Significant selection at functional sites
A model of neutral evolution
• KS – the number of synonymous substitutions per
synonymous site
• takes advantage of the redundant genetic code
• 4D sites GCx (ALA), CCx (PRO), TCx (SER),
ACx (THR), CGx (ARG), GGx (GLY),
CTx (LEU), GTx (VAL)
• “how much would a gene have changed if selection
had not acted upon it?”
Thomas et al.,
Nature 424, 788 - 793
Neutral
rates
vary
see also
Hardison et al.
Genome Res. 2003
13: 13-26.
Variation in rates of mutation
or rates of repair?
• Transcription-associated mutational strand
asymmetry (Phil Green et al. Nature Genetics 33: 514-7)
• Associated with transcription-coupled repair
processes (Majewski, Am J Human Genet 73, 688-692)
• Genes transcribed in the germline at high levels,
when mutated, are repaired more readily, than
those not transcribed in the germline.
• Majewski estimates that 71%-91% of genes are
transcribed in the germline!
Fe
ta
lb
Terai
n
Pi A H stis
tu m ep
ita yg 3
ry da b
gl l a
D and
O
H HH
U 2
V
THE C
Y
Sp Th DR in yr G
al oi
c d
U or
te d
r
A
dr P O v us
en r o ar
al sta y
Thgla te
ymnd
Fe Kid us
t n
W Pa al l ey
ho nc ive
l r r
Sa e b eas
lo
l iv
ar L od
y ive
gl r
an
Pl He d
a a
Tr cen rt
ac ta
h
Luea
Sp n
le g
en
Median Ks-value
Tissue-specific genes’ Ks
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Winter et al. Genome Research 14:54-61, 2004
A model for non-neutral evolution
• KA – the number of non-synonymous (amino acid
changing) substitutions per non-synonymous site
• What proportion of possible amino acid-changing
substitutions has occurred?
KA/KS (dN/dS, ω) ―
A model of selective pressure
conserving
0.0
diversifying
1.0
<< 1
>1
purifying selection
positive diversifying selection
Percentage of sequences per interval
25%
Domain-regions under
higher purifying selection
20%
Full Length proteins
Domain-free regions
Domain-containing regions
15%
10%
5%
0%
0.00
0.10
0.20
0.30
0.40
KA/KS
0.50
0.60
0.70
Percentage of sequences per interval
100%
Domain-regions are under
higher purifying selection
80%
60%
40%
Full Length proteins
Domain-free regions
Domain-containing regions
20%
0%
0.00
0.10
0.20
0.30
KA/KS
0.40
0.50
Higher purifying pressures in enzymes
Catalytic domains in
are
• more conserved
• under higher purifying selection
than non-catalytic domains
Selective Pressures vary with
cellular compartment
For 521
domain families of known locale:
KA/KS values
• Secreted >> Nuclear > Cytoplasmic
Questions
•
•
•
•
•
•
Are we ‘just’ E. coli, except more so? NO.
Where do new genes come from? Next week.
Do all genes evolve at the same rate? NO.
Do all tissues & organs evolve at the same rate? NO.
Where do we fit in the tree of life? Mammals!
What specifies the differences between us and
rodents, or us and chimps? Next week.
• What specifies the elevated complexity of us versus
other animals? Unknown.
• Can we understand sequence variation among
humans? Hopefully, we will.
• How can gene function contribute to behaviour? Next
week.
MRC Functional Genetics Unit, Oxford
Leo Goodstadt
Richard Emes
Eitan Winter
Steve Rice
Scott Beatson
Nick Dickens
Caleb Webber
Michael Elkaim
Jose Duarte
Ensembl (Ewan Briney, Michele Clamp, Abel Ureta-Vidal);
Richard Copley (WTCHG, Oxford); Ziheng Yang (UCL);
The Human, Mouse and Rat Genome Sequencing Consortia; UCSC
Bibliography
Human Genome Papers:
Lander et al. Nature (2001) 409, 860-921
Venter et al. Science (2001) 291, 1304-1351.
Mouse Genome Paper:
Waterston et al. Nature (2002) 420, 520-62.
Rat Genome Paper: submitted.
Comparative genomics & evolutionary rates:
Hardison et al. Genome Res. (2003) 13, 13-26.
Adaptive evolution of genomes:
Emes et al. Hum Mol Genet. (2003) 12, 701-9
Wolfe & Li Nat Genet. (2003) 33 Suppl: 255-65