BB30055: Genes and genomes

Transcript BB30055: Genes and genomes

BB30055: Genes and genomes
Genomes - Dr. MV Hejmadi ([email protected])
BB30055: Genomes - MVH
3 broad areas
(A) Genomes
(B) Applications genome projects
(C) Genome evolution
Why sequence the genome?
3 main reasons
• description of sequence of every gene valuable.
Includes regulatory regions which help in
understanding not only the molecular activities of the
cell but also ways in which they are controlled.
• identify & characterise important inheritable disease
genes or bacterial genes (for industrial use)
• Role of intergenic sequences e.g. satellites, intronic
regions etc
History of Human Genome Project (HGP)
1953 – DNA structure (Watson & Crick)
1972 – Recombinant DNA (Paul Berg)
1977 – DNA sequencing (Maxam, Gilbert and Sanger)
1985 – PCR technology (Kary Mullis)
1986 – automated sequencing (Leroy Hood & Lloyd Smith
1988 – IHGSC established (NIH, DOE) Watson leads
1990 – IHGSC scaled up, BLAST published (Lipman+Myers)
1992 – Watson quits, Venter sets up TIGR
1993 – F Collins heads IHGSC, Sanger Centre (Sulston)
1995 – cDNA microarray
1998 – Celera genomics (J Craig Venter)
2001 – Working draft of human genome sequence published
2003 – Finished sequence announced
Human Genome Project (HGP)
Goal: Obtain the entire DNA sequence of human genome
Players:
(A) International Human Genome Sequence Consortium
(IHGSC)
- public funding, free access to all, started earlier
- used mapping overlapping clones method
(B) Celera Genomics
– private funding, pay to view
- started in 1998
- used whole genome shotgun strategy
Whose genome is it anyway?
(A) International Human Genome Sequence Consortium
(IHGSC)
- composite from several different people generated
from 10-20 primary samples taken from numerous
anonymous donors across racial and ethnic groups
(B) Celera Genomics
– 5 different donors (one of whom was J Craig
Venter himself !!!)
sequencing genomes
Mapping phase
Sequencing phase
Strategies for sequencing the human genome
IHGSC
Celera
Result….
~30 - 40,000 protein-coding genes estimated
based on known genes and predictions
definite genes
possible genes
IHGSC
24,500
5000
Celera
26,383
12,000
Other genomes sequenced
1997
4,200 genes
2002
36,000 genes
1998
19,099 genes
2002
38,000 genes
Sept 2003
18,473
human orthologs
Science (26 Sep 2003)Vol301(5641)pp1854-1855
Genomics: World's smallest genome
•
the smallest genome known is the DNA of a
'nucleomorph' of Bigelowiella natans, a singlecelled algae of the group known as
chlorarachniophytes.
•
373,000 base pairs and a mere 331 genes
•
The nucleomorph is an evolutionary vestige that
was originally the nucleus of a eukaryotic cell. The
eukaryotic cell swallowed a cyanobacterium to
acquire a photosynthetic 'plastid' organelle, and
that cell was in turn engulfed by another cell to
produce B. natans as we know it. Now, most of the
nucleomorph's genome is concerned with its own
maintenance, and just 17 of its genes still exert any
control over the plastid. Its small size suggests it is
heading for evolutionary oblivion.
Proc. Natl Acad. Sci. USA 103, 9566–9571 (2006) by G McFadden, University of Melbourne, Australia
Organisation of human genome
Nuclear genome (3.2 Gbp)
Mitochondrial genome
24 types of chromosomes
Y- 51Mb and chr1 -279Mbp
http://www.ncbi.nlm.nih.gov/Genomes/
General organisation of human genome
Basic structure of a gene
Fig. 21.11
Polypeptide-coding regions
Gene organisation
Rare bicistronic transcription units
E.g. UBA52 transcription generates ubiquitin and
a ribosomal protein S27a
Non polypeptide–coding: RNA encoding
Class of RNA
Example types
Ribosomal RNA 16,23,18,28S
Function
Ribosomal subunits
Transfer RNA 22 mitochondrial
49 cytoplasmic
mRNA binding
Small nuclear U1,U2,U4,U5 etc
RNA(snRNA)
RNA splicing
Small nucleolar U3,U8 etc
RNA (snoRNA)
rRNA modification and processing
microRNA >200 types
(miRNA)
Regulatory RNA
XIST RNA
Inactivation of X chromosome
Imprinting H19 RNA
associated RNA
Antisense RNA >1500 types
Telomerase RNA
Genomic imprinting
Suppression of gene expression
Telomere formation
General organisation of human genome
Pseudogenes ()
non functional copies of an active gene.
May be either
a) Nonprocessed pseudogenes

May contain exons, introns & promoters but are
inactive due to inappropriate termination codons

Arise by gene duplication events usually in gene
clusters (e.g. a and b–globin gene clusters)
Pseudogenes in globin gene cluster
Gene fragments or truncated genes
Gene fragments: small
segments of a gene
(e.g. single exon from
a multiexon gene)
Truncated genes:
Short components
of functional genes
(e.g. 5’ or 3’ end)
Thought to arise due to unequal crossover or exchange
b) processed pseudogenes Thought to arise by
genomic insertion of
a cDNA as a result of
retrotransposition
Contributes to
overall repetitive
elements (<1%)
processed pseudogenes -
General organisation of human genome
Unique or low copy number
sequences
Non –coding, non repetitive and single copy
sequences of no known function or
significance
General organisation of human genome
Repetitive elements
Main classes based on origin
 Tandem repeats
 Interspersed repeats
 Segmental duplications
References
1) Chapter 9 pp 265-268
HMG 3 by Strachan and Read
2) Chapter 10: pp 339-348
Genetics from genes to genomes by
Hartwell et al (2/e)
3) Nature (2001) 409: pp 879-891
http://www.ncbi.nlm.nih.gov/Genomes/

BB30055: Genes and genomes

Transcript BB30055: Genes and genomes

Directory