Transcript Slide 1

The Nature of a GENE;
Component Parts
Susquehanna MAGNET School for Medicine and
Health Sciences
October 7, 2013
Professor Michael Chorney
Learning Objectives
Explain the nature of codons and the information inherent in
their base composition
Explain what is meant by degeneracy
Describe the nature of a gene, its component parts and
hallmarks
Discuss the nature of exons, and what is meant by an open
reading frame; in addition, define the what is meant by
exon splicing
DNA Scale, the numbers, Review
We have a lot of DNA in each gamete (3.1647 x 109 basepairs); cf a
bacterium which contains about 4.6 million basepairs
200 phonebooks the size of Manhattan’s (1,000 pages) would be
required to tally the information; if you read the bases nonstop, it
would take 9.5 years to complete
The fertilized ovum contains twice a much DNA as each gamete, or
(6.4 x109 basepairs); half maternal, half paternal*
*
•
•
•
The body contains 10-100 trillion cells=1014
A cell contains 12 picograms of DNA= 12x10-12=6x109 basepairs (G, A, T, C)
The body contains 1014 cells x 12x10-12 g=1,200 g of DNA=0.25% wt=6.4x1022 basepairs
DNA Scale, the numbers.2, Review
The DNA in each somatic cell is arranged into chromosomes, i.e., linear
strands of DNA of varying lengths
The DNA is condensed by proteins of opposite charge, called histones,
which provides a means for regulating base (information) access by
other proteins
Condensed DNA, during mitosis, can be easily stained, revealing the
chromosomes’ size and banding variation (reflected in the variation of
A/T and C/G content)
Cytogenetics
Giemsa-stained metaphase spread of human chromosomes from
one cell, the most condensed form of DNA within the cell, seen at
MITOSIS
Vocabulary
metacentric
sub-metacentric
acro(telo)centric
centromere
p arm
q arm
banding
heterochromatin
euchromatin
telomeres
autosome
Each chromosome is a linear strand of
helical ds-DNA with capped ends called
telomeres
Information flow
Sense strand
A GENE
TRANSCRIPT
Figure 6-2 Molecular Biology of the Cell (© Garland Science 2008)
Anti-sense
strand
Like copy
ing the leading strand
Figure 6-21 Molecular Biology of the Cell (© Garland Science 2008)
RNA polymerase
replaces U for T (why?) in
RNA and ribose for
the deoxy sugar
Deaminated C=U
Figure 6-4 Molecular Biology of the Cell (© Garland Science 2008)
DNA, the Puzzle, Review
Only a small amount (percentage) of human DNA contains information that
is ostensibly converted into proteins: these sequences are associated with
genes. The proteins coded for by genes do biochemical work and regulate
cell division, generate energy, respond to the environment, provide
immunity to invasive DNA sequences (infection), etc.
What (and where) is this information we keep hearing so
much about?
For starters,
It resides in the bases; particular triplet base
combinations which comprise the exons and
provide information called codons
You should have been exposed to the list of codons
last week in the case study, next slide
There are 64 codons that equate to the twenty amino
acids (a.a’s), with multiple codons existing for most of
the a.a.’s, called degeneracy. Three of the codons
are called termination codons, more later
CODONS, what do you notice?
Figure 6-50 Molecular Biology of the Cell (© Garland Science 2008)
The question for
molecular biologists:
What distinguishes
a gene (1-2% of DNA)
from the remaining
DNA (98%)?
This has posed a
problem for some
time; now that this is
becoming solved, the
question becomes,
what does the ‘gene’
do?
Figure 4-7 Molecular Biology of the Cell (© Garland Science 2008)
Can you see, at
a quick glance, a
gene in the
sequence at the
left?
The yellow highlighted
bases signify the
beta globin gene!!!
Genes are subject to the following:
1. They must be recognized by a polymerase, that
is, an RNA polymerase that will guide gene copying
called TRANSCRIPTION—compare DNA polymerase
2. The collective DNA sequence that summons forth
RNA polymerase is called a PROMOTER
3. The information copied into RNA immediately
adjacent to the promoter must be readable
(CODING SEQUENCE); i.e. no stop codons until
the naturally determined end of translation
4. There has to be a place after the coding sequence
that signals the end of transcription, different than the
end of translation
The eukaryotic gene’s general features and processing characteristics
5’
p
exon
AGGT A AGG
exon AGGT A AGG exon AGGT A AGG exon
AATAAA
3’UTR
3’
ATG
STOP
The gene is controlled by a promoter (p) which is not simple – there are
generalized transcription factors and more gene-specific ones that may reside
outside of the promoter proper, within the gene, within the 3’ end of the gene
or even far 5’ and/or 3’ of the gene itself –they open the DNA and expose sites
The gene is structured in ‘staccato,’ with coding sequence (exons) interrupted by
noncoding intervening sequences, called introns; the first exon begins with the ATG
met codon, the last exon ends with one of three translational terimantion codons
(TAA, TAG, TGA)
Termination of transcription occurs in the 3’ untranslated region (3’UTR) which
possesses termination signals and an RNA domain which drives 3’ processing, the
AATAAA polyadenylation signal
Exon-intron borders possess sequences which aid in splicing, AG/GT……A……AG/G
along with small, nuclear RNAs forming the spliceosome
5’UTR
exon1
exon 2
exon 3
exon 4
3’UTR
CpG Islands: under-represented nucleotides found at the 5’ end of eukaryotic genes
AATAAA
5’
p
exon
AUG
AGGT A AGG
exon AGGT A AGG exon AGGT A AGG exon
3’UTR
STOP
CH3ase
[CG]
Maintaining DNA euchromatic also rests upon factors that bind to C’s and G’s, which
protect the CpG ‘islands’ from cytosine methylases best known for their role in
imprinting
Let’s try a poor analogy, constrained by the English
language and a dearth of three-letter words, but
Here goes….
Find the three letter (codon)-containing ‘exons’ that
make a kind of a sensible phrase (names included)This is comparable to an open reading frame
Word
DNA
…..Wlsjeutlsjimsatouttutyecmdsisladksltkald
Thedayforeeeuslkeiandseveeubhismomand
ttugosocunntewherebudtedandtueislsiecn
Tisnggotallsixeooaltaxlekqzztiellforthebigbadsum
rrrrrrrrrrrrteidas………
Answer: jimsatoutthedayforhismomandbudted
andgotallsixforthebigbadsum……….
jim sat out the day for his mom and bud ted
and got all six for the big bad sum……….
…..Wlsjeutlsjimsatouttutyecmdsisladksltkald
Thedayforeeeuslkeiandseveeubhismomand
ttugosocunntewherebudtedandtueislsiecn
Tisnggotallsixeooaltaxlekqzztiellforthebigbadsum
rrrrrrrrrrrrteidas………
What happens if I delete the s?
Jim sat out the day for him oma ndb udt eda
ndg ota lls ixf ort heb igb ads um……….
FRAMESHIFT—the OPEN READING FRAME
IS GONE
RNA
Figure 6-51 Molecular Biology of the Cell (© Garland Science 2008)
CODING
SEQUENCE
IS CONSERVED
SEQUENCE
ACROSS
SPECIES
LEPTIN
GENE
ALIGNMENT
Figure 4-76 Molecular Biology of the Cell (© Garland Science 2008)
THERE IS GREATER EVOLUTIONARY PRESSURE
TO CONSERVE CODING SEQUENCE (EXONS) THAN
INTRON SEQUENCES
Figure 4-78 Molecular Biology of the Cell (© Garland Science 2008)
DNA, the puzzle.2
Humans have approximately 23,000 genes (down from the 80-140k prediction
Genes are dispersed along the chromosomes in what appears to be a random
fashion, although many gene clusters exist which seem to aid coordinate
expression: globin, histone, immunoglobulin, MHC, etc.
Some chromosomes are more rich in genes than others, although
chromosome size roughly correlates with gene number
A gene’s location is termed its locus as we have touched upon
Genes vary in size, from beginning to end
And in their number of exons, whose tally following splicing
must = an open reading
frame, or ORF
Exons’ size varies, but average about 200 basepairs (based on my
Knowledge of the Ig superfamily members); their translated sequences often
equate to ‘domains,’ units of primary amino acid sequence that perform function
The average protein is 45Kd (110 for the mw of an average amino acid); the
average size of a spliced gene (mRNA) is 1.5 kb, therefore, the amount of coding
sequence in the human genome is 0.14%
http://www.cshlp.org/ghg5_all/section/gene.shtm
BIG GENESl