Genomics – The Language of DNA
Transcript Genomics – The Language of DNA
Genomics – The Language
The structure and
meaning of the human
Genes span about 25% of the
Coding sequences amount to
about 1%( the exons)
The intragenic regions or
introns amount to the remaining
Intergenic sequences are non
coding and are between genes
About 20-30%of the genes are
clustered in CpG islands.
There are gene deserts that
comprise about 20% of the
There are also sequences that
are made of heterochromatin
( centromeres and telomeres)
Highly repetitive: About 10-15% of
mammalian DNA reassociates very
rapidly. This class includes tandem
Moderately repetitive: Roughly 2540% of mammalian DNA reassociates
at an intermediate rate. This class
includes interspersed repeats.
Single copy (or very low copy
number): This class accounts for
50-60% of mammalian DNA.
The size of a satellite DNA ranges from
100 kb to over 1 Mb. In humans, a well
known example is the alphoid DNA located
at the centromere of all chromosomes.
Its repeat unit is 171 bp and the repetitive
region accounts for 3-5% of the DNA in
each chromosome. Other satellites have a
shorter repeat unit.
Most satellites in humans or in other
organisms are located at the centromere.
The size of a minisatellite ranges
from 1 kb to 20 kb. One type of
minisatellites is called variable
number of tandem repeats
(VNTR). Its repeat unit ranges
from 9 bp to 80 bp. They are
located in non-coding regions. The
number of repeats for a given
minisatellite may differ between
individuals. This feature is the basis
of DNA fingerprinting.
Variable Number of Tandem
Repeat (VNTR) Polymorphism
VNTR may result from unequal
crossover. It is the molecular
basis of DNA fingerprinting
which has many practical
Another type of minisatellites is
the telomere. In a human germ
cell, the size of a telomere is
about 15 kb. In an aging
somatic cell, the telomere is
shorter. The telomere contains
tandemly repeated sequence
Microsatellites are also known as
short tandem repeats (STR),
because a repeat unit consists of
only 1 to 6 bp and the whole
repetitive region spans less than 150
Similar to minisatellites, the number
of repeats for a given microsatellite
may differ between
microsatellites can also be used for
Miniature Inverted-repeat Transposable
almost identical sequences of about
400 base pairs flanked by
characteristic inverted repeats of
about 15 base pairs such as
Transposons are segments of DNA
that can move around to different
positions in the genome of a single
cell. In the process, they may
increase (or decrease) the amount of
DNA in the genome.
These mobile segments of DNA are
sometimes called "jumping genes".
Class II Transposons consisting only
of DNA that moves directly from
place to place.
Class III Transposons; also known
as Miniature Inverted-repeats
Transposable Elements or MITEs.
Retrotransposons (Class I) that
first transcribe the DNA into RNA and
use reverse transcriptase to make a
DNA copy of the RNA to insert in a new
Both ends of the transposon, which
consist of inverted repeats; that is,
identical sequences reading in
A sequence of DNA that makes up
the target site. Some transposases
require a specific sequence as their
target site; other can insert the
transposon anywhere in the genome.
The human genome contains some
850,000 LINEs (representing some
21% of the genome).
Most of these belong to a family
called LINE-1 (L1).
These L1 elements are DNA
sequences that range in length from
a few hundred to as many as 9,000
Only about 50 L1 elements are
functional "genes"; that is, can be
transcribed and translated.
The functional L1 elements are
about 6,500 bp in length and
encode three proteins, including
An endonuclease that cuts DNA
A reverse transcriptase that
makes a DNA copy of an
L1 activity proceeds as follows:
RNA polymerase II transcribes the L1
DNA into RNA.
The RNA is translated by ribosomes in the
cytoplasm into the proteins.
The proteins and RNA join together and
reenter the nucleus.
The endonuclease cuts a strand of "target"
DNA, often in the intron of a gene.
The reverse transcriptase copies the L1
RNA into L1 DNA which is inserted into
the target DNA forming a new L1 element
SINEs (Short interspersed
SINEs are short DNA sequences
(100–400 base pairs) that represent
reverse-transcribed RNA molecules
originally transcribed by RNA
polymerase III; that is, molecules of
tRNA, 5S rRNA, and some other
small nuclear RNAs. The most
abundant SINEs are the Alu
elements. There are over one million
copies in the human genome
(representing about 11% of the total
Alu elements consist of a sequence
of 300 base pairs containing a site
that is recognized by the restriction
enzyme AluI. They appear to be
reverse transcripts of 7S RNA, part
of the signal recognition particle.
Most SINEs do not encode any
functional molecules and depend on
the machinery of active L1 elements
to be transposed; that is, copied and
pasted in new locations.