Transcript Gene

Human
Molecular Genetics
Institute of Medical Genetics
Outline of this chapter
 Definition
 Structure
 Organization
Gene
Molecular definition:
DNA sequence encoding protein
What are the problems with this
definition?
Gene definition caveats
Some genomes are RNA instead of
DNA
 Some gene products are RNA (tRNA,
rRNA, and others) instead of protein
 Some nucleic acid sequences that do
not encode gene products
(noncoding regions) are necessary
for production of the gene product
(RNA or protein)

Gene
 Gene - is a segment of DNA encoding
information leading to a functional product (RNA
or polypeptide chain);
 The most important feature of a gene is it must
code for a functional product.
 There are 30,000 to 35,000 genes in the human
genome.
Hybridization of mRNA and
DNA
Eukaryotic genes are split genes
It includes coding region and noncoding regions.
A “Simple” Eukaryotic Gene
Transcription
5’ Untranslated Region
Start Site
Introns
5’
Exon 1 Int. 1
Promoter/
Control Region
Exon 2
3’ Untranslated Region
3’
Int. 2 Exon 3
Terminator
Sequence
Exons
RNA Transcript
5’
Exon 1 Int. 1
Exon 2
Int. 2 Exon 3
3’
Gene Structure
 Exons
 Introns
 Splicing junction
 Regulatory sequences
- Promoter/proximal control elements
- Enhancer/silencer
- Terminator
Exons
 Segment of a gene which is decoded to
give an mRNA product or a mature RNA
product.
 Individual exons may contain coding
DNA or noncoding DNA (untranslated
sequences, UTS).
Coding region
Nucleotides (open reading frame) encoding
the amino acid sequence of a protein
Introns
 Noncoding DNA which separates
neighboring exons in a gene.
 During gene expression introns, like exons,
are transcribed into RNA but the transcribed
intron sequences are subsequently removed
by RNA splicing and are not present in mRNA.
Splice junction
(exon/intron boundary
 Splice donor site: the junction between the end
of an exon and the start of the downstream intron,
commencing with the dinucleotide GT.
 Splice acceptor site: the junction between the
end of an intron terminating in the dinucleotide
AG, and the start of the next exon.
 Branch site: the third conserved intronic
sequence that is known to be functionally
important in splicing
Splice junction
(exon/intron boundary
Splice junction
(exon/intron boundary
Splice junction
(exon/intron boundary)
Splice junction
(exon/intron boundary)
 Consensus sequences are conserved
throughout eukaryotes
 Conservation of sequence is expected,
since recognition of sequences is
accomplished by base pairing with
snRNPs RNA component
Secondary
structure model of
human U1 snRNP.
The region where it
recognizes the premRNA is also
shown
Regulatory Sequences
5’ untranscribed region. Signals for
initiation and control of transcription
- Promoter/proximal elements
 Enhancer / Silencer
-Enhancer stimulates transcription
-Silencer inhibits transcription
 3’ untranscribed region. Signals for
termination of transcription

Regulatory Sequences
Promoter/Proximal Elements



Occur within ~200 bp of the start site.
Contain up to ~20 bp.
Cell-type specific
Basal Promoter Analysis
ATATAA
-30 TBP
 GGCCAATC
-75 CTF/NF1
 GCCACACCC
-90 SP1

+1
GC CAAT
TATA
Promoter-Proximal Elements

TATA box
Most common
Highly transcribed genes
25~35 base pairs upstream of start site

Initiator
At start site

GC boxes (CpG islands)
“Housekeeping” genes (transcribed at low
rate)
Within ~100 base pairs of start site
TATA box
 ~ 25 bp upstream of +1
 Only promoter element that is relatively
fixed in relation to start point
 Tends to be surrounded by GC-rich
sequences
 Single base substitutions in TATA 
strong promoter down mutations
 Some promoters do not contain TATA
Initiator


Instead of a TATA box, some eukaryotic gene
contain an alternative promoter element,
called an initiator.
Initiator is highly degenerative.
+1
5’ Y Y A N T/A Y Y Y
Y = pyrimidine (C or T)
N = any
CpG island




Genes coding for intermediary metabolism are
transcribed at low rates, and do not contain a TATA
box or initiator.
Most genes of this type contain a CG-rich stretch of
20-50 nt within ~100 bp upstream of the start site
region.
A transcription factor called SP1 recognizes these
CG-rich region.
Gives multiple alternative mRNA start sites.
mRNA
~100 bp
CpG island
Multiple
5’-start sites
Enhancers





Can be located several kb from promoter
Can be present in either orientation
relative to the promoter
Contain elements that bind inducible
factors
Usually ~100-200 bp long, containing
multiple 8- to 20-bp control elements.
Targets for tissue specific and/or temporal
regulation
Enhancer
 Variable
distance from
promoter
 Either
orientation
 Upstream or
downstream
of gene
TERMINATION
• RNA polymerase meets the terminator
• Terminator sequence: AAUAAA
• RNA polymerase releases from DNA
• Prokaryotes-releases at termination
signal
• Eukaryotes-releases 10-35 base pairs
after termination signal
Termination

Different mechanisms of termination

Prokaryotes
 rho-independent termination: formation of
a hairpin structure
 rho-dependent termination: external
protein disrupts transcription

Eukaryotes
 cleavage of the RNA by an external protein
Rho-independent terminator
Distribution

Different density of genes along a
chromosome

Different density of genes between
chromosomes
(exon-intron-exon)n structure of
various genes
histone
total = 400 bp; exon = 400 bp
b-globin
total = 1,660 bp; exons = 990 bp
HGPRT
(HPRT)
total = 42,830 bp; exons = 1263 bp
factor VIII
total = ~186,000 bp; exons = ~9,000 bp
Genes

Protein Coding

RNA genes
 rRNA
 tRNA
 snRNA, snoRNA…
”Average” gene organization

Single, unique genes consisting of
exons interrupted by introns only
Other gene organizations


Dispersed gene segments brought
together by genome reorganization
in specialized cells
Example: gene for bT-cell receptor
protein in T-cells
Light Chain Gene Families
Germ line gene organization
Lambda light chain genes; n=30
V1
L
P
L
V2
P
L
Vn
J
1
P
C
1
J
2
E
C
2
J
3
E
C
3
J
4
E
C
4
E
Kappa light chain genes; n=300
L
P
V1
L
P
V2
L
P
Vn
J 1
J 2
J 3
J 4
J 5
C
E
Light Chain Gene Families
Gene rearrangement and expression
L
V 1
P
L
V 2
P
V n
L
P
L
V 1
P
L
V 2 J4
J 5
P
L
V J
DNA
C
Primary transcript
RNA
L V J C
mRNA
Translation
DNA
C
E
E
RNA Processing
C
E
DNA Rearrangement
Transcription
J 1 J 2 J 3 J 4 J 5
RNA
L V J C
Protein
Transport to ER
V J C
Protein
V
C
Heavy Chain Gene Family
Germ line gene organization
Heavy chain genes; Vn=1000, Dn=15
L
P
V1
L
V2
P
L
D1 D2 D3
Vn
Dn
J1 J2 J3 J4 J5
P
C
C
E
C
C
C
C
3
1
2
4
C
C
C
1
2
CH1 H CH2 CH3 CH4
Introns separate exons coding for H chain domains
Heavy Chain Gene Family
Gene rearrangement and expression
L
P
V1
L
V2
P
D1 D2 D3
Vn
L
Dn
C
J1 J2 J3 J4 J5
C
E
P
DJ rearrangement
L
V1
L
V2
L
D1 D2 J4 J5
Vn
C
C
DNA
P
P
P
E
VDJ rearrangement
L
V1
L
P
V2 D2 J4
P
C
J5
C
DNA
E
Transcription
L
V2 D2 J4
C
J5
E
C
Primary transcript
RNA
Other gene organizations
Overlapping genes
met
val
Gene 1
G T T T A T G GT A
val
tyr
gly
Gene 2
Other gene organizations

Genes-within-genes
 It is not uncommon that short genes are
located inside an intron of another gene
Intron 26 of the NF1 gene contains
three internal genes.
Other gene organizations
Gene families: functionally similar or
identical genes repeated on the same
or different chromosomes
 Example 1: genes for histones and
(ribosomal) rRNA
 Example 2: The globin families

 Gene
families defined by conserved
amino acid motifs
 DEAD box.
 WD repeat families
 Clustered
gene families
Growth hormone
aglobin
Hox genes (multi)
Olfactory receptors
large
5 copies (67kb)
7 copies (50kb)
38 four clusters
1000 in 25
clusters
 Interspersed
gene families
Pax
9 copies
Actin
>20 copies
Alu elements (repeats) 1.1 million
LINE elements (L1)
200-500,000
Pseudogenes
Nonfunctional copies of genes
 Formed by duplication of ancestral
gene, or reverse transcription (and
integration)
 Not expressed due to mutations that
produce a stop codon (nonsense or
frameshift) or prevent mRNA
processing, or due to lack of
regulatory sequences
