Comparative Genomics

Download Report

Transcript Comparative Genomics

The Human Genome Project
Public: International Human Genome
Sequencing Consortium (aka HUGO)
Private: Celera Genomics, Inc. (aka TIGR)
The HGP
1st proposed in 1986
In addition to humans, the effort included E. coli,
yeast, C. elegans, Drosophila, and mouse
Funded in 1988
Estimated cost: $3 billion
Got underway in 1990
Final cost: $2.6 billion
1st genome sequenced in 1995 (TIGR)
Yeast sequenced in 1996
E. coli sequenced in 1997
C. elegans sequenced in 1998
Drosophila sequenced in 2000 (Celera)
The Human Sequence
Human draft sequence released in Jan. 2001
(HUGO & Celera)
The genome was sequenced about 4 times over
Contained errors and gaps
Gaps can exist:
1) within unfinished sequence clones
2) between sequenced BACs
3) between mapped BACs
The finished sequence, released in April of
2003, was sequenced 8 times over, had 1 error
in 10,000 bases and did not contain significant
gaps
The “Typical” Human Gene
Size of exons
# of exons
Size of introns
Size of 3’ UTR
Size of 5’ UTR
Coding sequence size
CDS
Genomic extent
145 bp
8.8
3,365 bp
770 bp
300 bp
1,340 bp
447 aa
27 kb
The Number of Human Genes
140,000
120,000
100,000
80,000
60,000
40,000
20,000
0
Early
estimates
Later
estimates
Draft
sequence
Final
sequence
# of Genes in Other Organisms
25000
20000
15000
10000
5000
0
M. g
E. c
S. c
D. m
C. e
H. s
A. t
Orthologs of Human Proteins
Where did the prokaryotic
orthologs come from?
One possibility is horizontal transfer
41 genes may have been transferred in this
way
For example: MAOs, monoamine oxidases
These enzymes deactivate neurotransmitters

Another
possibility is the loss of these
genes over time so that most eukaryotes
lack them
Functional Categories of Proteins
Families of Transcription Factors
Some surprises from the HGP
Not every gene has its own promoter
 Not every gene encodes a protein
 The number of genes in our genome

Promoters: a number of adjacent genes
are transcribed simultaneously. These
genes were shown to share a promoter,
much like prokaryotes control gene
expression.
Genes that do not encode proteins
tRNA
 rRNA
 snRNAs (small nuclear RNAs)
 snoRNAs (small nucleolar RNAs)
 ncRNAs (non-coding RNAs)
These are untranslated genes such as the
let-7 gene in C. elegans. It encodes a
21-base RNA that binds to another gene

How Can We Have So Few Genes?
Combinatorial Control
We are not just 1.5 times as complex as flies,
even though we have about 1.5 times the
number of genes.
If each gene has 2 states: on or off, then
there are 213,600 different combinations in
Drosophila but 221,000 different combinations
in humans.

 Alternate
Splicing
 Epigenetic Control