Transcript Gene
Human
Molecular Genetics
Institute of Medical Genetics
Outline of this chapter
Definition
Structure
Organization
Gene
Molecular definition:
DNA sequence encoding protein
What are the problems with this
definition?
Gene definition caveats
Some genomes are RNA instead of
DNA
Some gene products are RNA (tRNA,
rRNA, and others) instead of protein
Some nucleic acid sequences that do
not encode gene products
(noncoding regions) are necessary
for production of the gene product
(RNA or protein)
Gene
Gene - is a segment of DNA encoding
information leading to a functional product (RNA
or polypeptide chain);
The most important feature of a gene is it must
code for a functional product.
There are 30,000 to 35,000 genes in the human
genome.
Hybridization of mRNA and
DNA
Eukaryotic genes are split genes
It includes coding region and noncoding regions.
A “Simple” Eukaryotic Gene
Transcription
5’ Untranslated Region
Start Site
Introns
5’
Exon 1 Int. 1
Promoter/
Control Region
Exon 2
3’ Untranslated Region
3’
Int. 2 Exon 3
Terminator
Sequence
Exons
RNA Transcript
5’
Exon 1 Int. 1
Exon 2
Int. 2 Exon 3
3’
Gene Structure
Exons
Introns
Splicing junction
Regulatory sequences
- Promoter/proximal control elements
- Enhancer/silencer
- Terminator
Exons
Segment of a gene which is decoded to
give an mRNA product or a mature RNA
product.
Individual exons may contain coding
DNA or noncoding DNA (untranslated
sequences, UTS).
Coding region
Nucleotides (open reading frame) encoding
the amino acid sequence of a protein
Introns
Noncoding DNA which separates
neighboring exons in a gene.
During gene expression introns, like exons,
are transcribed into RNA but the transcribed
intron sequences are subsequently removed
by RNA splicing and are not present in mRNA.
Splice junction
(exon/intron boundary
Splice donor site: the junction between the end
of an exon and the start of the downstream intron,
commencing with the dinucleotide GT.
Splice acceptor site: the junction between the
end of an intron terminating in the dinucleotide
AG, and the start of the next exon.
Branch site: the third conserved intronic
sequence that is known to be functionally
important in splicing
Splice junction
(exon/intron boundary
Splice junction
(exon/intron boundary
Splice junction
(exon/intron boundary)
Splice junction
(exon/intron boundary)
Consensus sequences are conserved
throughout eukaryotes
Conservation of sequence is expected,
since recognition of sequences is
accomplished by base pairing with
snRNPs RNA component
Secondary
structure model of
human U1 snRNP.
The region where it
recognizes the premRNA is also
shown
Regulatory Sequences
5’ untranscribed region. Signals for
initiation and control of transcription
- Promoter/proximal elements
Enhancer / Silencer
-Enhancer stimulates transcription
-Silencer inhibits transcription
3’ untranscribed region. Signals for
termination of transcription
Regulatory Sequences
Promoter/Proximal Elements
Occur within ~200 bp of the start site.
Contain up to ~20 bp.
Cell-type specific
Basal Promoter Analysis
ATATAA
-30 TBP
GGCCAATC
-75 CTF/NF1
GCCACACCC
-90 SP1
+1
GC CAAT
TATA
Promoter-Proximal Elements
TATA box
Most common
Highly transcribed genes
25~35 base pairs upstream of start site
Initiator
At start site
GC boxes (CpG islands)
“Housekeeping” genes (transcribed at low
rate)
Within ~100 base pairs of start site
TATA box
~ 25 bp upstream of +1
Only promoter element that is relatively
fixed in relation to start point
Tends to be surrounded by GC-rich
sequences
Single base substitutions in TATA
strong promoter down mutations
Some promoters do not contain TATA
Initiator
Instead of a TATA box, some eukaryotic gene
contain an alternative promoter element,
called an initiator.
Initiator is highly degenerative.
+1
5’ Y Y A N T/A Y Y Y
Y = pyrimidine (C or T)
N = any
CpG island
Genes coding for intermediary metabolism are
transcribed at low rates, and do not contain a TATA
box or initiator.
Most genes of this type contain a CG-rich stretch of
20-50 nt within ~100 bp upstream of the start site
region.
A transcription factor called SP1 recognizes these
CG-rich region.
Gives multiple alternative mRNA start sites.
mRNA
~100 bp
CpG island
Multiple
5’-start sites
Enhancers
Can be located several kb from promoter
Can be present in either orientation
relative to the promoter
Contain elements that bind inducible
factors
Usually ~100-200 bp long, containing
multiple 8- to 20-bp control elements.
Targets for tissue specific and/or temporal
regulation
Enhancer
Variable
distance from
promoter
Either
orientation
Upstream or
downstream
of gene
TERMINATION
• RNA polymerase meets the terminator
• Terminator sequence: AAUAAA
• RNA polymerase releases from DNA
• Prokaryotes-releases at termination
signal
• Eukaryotes-releases 10-35 base pairs
after termination signal
Termination
Different mechanisms of termination
Prokaryotes
rho-independent termination: formation of
a hairpin structure
rho-dependent termination: external
protein disrupts transcription
Eukaryotes
cleavage of the RNA by an external protein
Rho-independent terminator
Distribution
Different density of genes along a
chromosome
Different density of genes between
chromosomes
(exon-intron-exon)n structure of
various genes
histone
total = 400 bp; exon = 400 bp
b-globin
total = 1,660 bp; exons = 990 bp
HGPRT
(HPRT)
total = 42,830 bp; exons = 1263 bp
factor VIII
total = ~186,000 bp; exons = ~9,000 bp
Genes
Protein Coding
RNA genes
rRNA
tRNA
snRNA, snoRNA…
”Average” gene organization
Single, unique genes consisting of
exons interrupted by introns only
Other gene organizations
Dispersed gene segments brought
together by genome reorganization
in specialized cells
Example: gene for bT-cell receptor
protein in T-cells
Light Chain Gene Families
Germ line gene organization
Lambda light chain genes; n=30
V1
L
P
L
V2
P
L
Vn
J
1
P
C
1
J
2
E
C
2
J
3
E
C
3
J
4
E
C
4
E
Kappa light chain genes; n=300
L
P
V1
L
P
V2
L
P
Vn
J 1
J 2
J 3
J 4
J 5
C
E
Light Chain Gene Families
Gene rearrangement and expression
L
V 1
P
L
V 2
P
V n
L
P
L
V 1
P
L
V 2 J4
J 5
P
L
V J
DNA
C
Primary transcript
RNA
L V J C
mRNA
Translation
DNA
C
E
E
RNA Processing
C
E
DNA Rearrangement
Transcription
J 1 J 2 J 3 J 4 J 5
RNA
L V J C
Protein
Transport to ER
V J C
Protein
V
C
Heavy Chain Gene Family
Germ line gene organization
Heavy chain genes; Vn=1000, Dn=15
L
P
V1
L
V2
P
L
D1 D2 D3
Vn
Dn
J1 J2 J3 J4 J5
P
C
C
E
C
C
C
C
3
1
2
4
C
C
C
1
2
CH1 H CH2 CH3 CH4
Introns separate exons coding for H chain domains
Heavy Chain Gene Family
Gene rearrangement and expression
L
P
V1
L
V2
P
D1 D2 D3
Vn
L
Dn
C
J1 J2 J3 J4 J5
C
E
P
DJ rearrangement
L
V1
L
V2
L
D1 D2 J4 J5
Vn
C
C
DNA
P
P
P
E
VDJ rearrangement
L
V1
L
P
V2 D2 J4
P
C
J5
C
DNA
E
Transcription
L
V2 D2 J4
C
J5
E
C
Primary transcript
RNA
Other gene organizations
Overlapping genes
met
val
Gene 1
G T T T A T G GT A
val
tyr
gly
Gene 2
Other gene organizations
Genes-within-genes
It is not uncommon that short genes are
located inside an intron of another gene
Intron 26 of the NF1 gene contains
three internal genes.
Other gene organizations
Gene families: functionally similar or
identical genes repeated on the same
or different chromosomes
Example 1: genes for histones and
(ribosomal) rRNA
Example 2: The globin families
Gene
families defined by conserved
amino acid motifs
DEAD box.
WD repeat families
Clustered
gene families
Growth hormone
aglobin
Hox genes (multi)
Olfactory receptors
large
5 copies (67kb)
7 copies (50kb)
38 four clusters
1000 in 25
clusters
Interspersed
gene families
Pax
9 copies
Actin
>20 copies
Alu elements (repeats) 1.1 million
LINE elements (L1)
200-500,000
Pseudogenes
Nonfunctional copies of genes
Formed by duplication of ancestral
gene, or reverse transcription (and
integration)
Not expressed due to mutations that
produce a stop codon (nonsense or
frameshift) or prevent mRNA
processing, or due to lack of
regulatory sequences