Transcript 5` 3`

DNA STRUCTURE
DOUBLE HELIX
3’
5’
3’
5’
Antiparallel DNA strands
Hydrogen bonds between bases
Fig.1.8
HOW TO DEFINE A GENE? (there are many descriptions...)
- sequence of DNA essential for specific function
- codes for protein or structural RNA
ATG
5’
3’
TAA
3’
5’
DNA
“structural” gene
Transcription &
RNA processing
Gene + flanking regulatory sequences
AUG
UAA
5’
3’
RNA
UTRs - untranslated regions which flank the coding sequence in a mRNA
(so in transcribed region)
Where is translation initiation site?
Where is transcription initiation site?
promoter?
Eukaryotic (but not prokaryotic) genes usually contain introns
5’
3’
ATG
Intron 2
Intron 1
“Exon 1”
Exon 2
“Exon 3”
3’
5’
5’ UTR
Exon 1
coding region
Exon 2
TAA
3’ UTR
3’
5’
DNA
mRNA
Exon 3
Intron - non-coding sequences removed from pre-RNA (by splicing)
Exon - sequences that remain in mature RNA (mostly coding)
Nomenclature “problem”:
• Textbooks (& papers) often show only coding sequences as exons,
but first exon includes 5’UTR and last exon includes 3’UTR
• Dilemma because often the positions of RNA ends are not known
or tissue-specific differences
• Introns can also occur within UTR regions
Example of human pax6 gene
Lines: introns
Bars: exons
What does the bent arrow signify?
Tall bars: coding exons
Short bars: non-coding exons
Where would the initiation and stop codons be?
Mercer Nat Rev Genet 10: 155, 2009
1. Human genes:
Intron length: typically ~200 nt to > 10 kb
Number per gene: several to dozens…
Exon length: typically 100 - 200 nt
Extreme example: dystrophin gene (~2400 kb) with ~78 introns!!
Tennyson, Klamut & Worton (1995) “The human dystrophin gene requires 16
hours to be transcribed and is cotranscriptionally spliced” Nat Genet.9:184-90
Genes-within-genes!
Other genes are sometimes located within long introns!
… in same or opposite orientation (see Practice set #1, question 4)
2. Plant genes:
Intron density similar to animals, but shorter length: typically 100 - 300 nt
3. Yeast genes:
< 5% have introns (vs. mammals where >95% genes have introns)
- mostly in tRNA genes (intron length ~ 20-30 nt)
…and in ribosomal protein genes (intron length ~ 100-500 nt)
Structure of NF2 (neurofibromatosis type II) gene in
various animals
What features of this gene are different among these animals?
Golovnina et al. BMC Evol Biol 2005
Bacterial genes are often organized in operons
with short intergenic spacers
- polycistronic mRNA, but each gene
has its own start and stop codons
Gene A
Gene B Gene C
But neighbouring operons might be in opposite orientation in genome
Gene 2
Gene 1
5’…ATAGGACAT
5’ …gatcgctctataggaggtgc ATGCAATGG…3’
3’…TATCCTGTA ctagcgagatatcctccacg TACGTTACC…5’
Aside: My examples will often show
unrealistically short sequences
What are N-terminal sequences of proteins encoded by genes 1 and 2?
See also Practice question #2
Where would promoter(s) for genes 1 and 2 be located?
Gene 2
Gene 1
Presence of genes located close together but encoded on
opposite strands is sometimes also seen in eukaryotic genomes
bidirectional
promoter ?
Adachi & Lieber Cell 109: 807, 2002
5’
RNA structure
Features of
RNA vs. DNA
RNA synthesis
5’
3’
“Coding strand”
Template strand
mRNA has same sequence
as coding strand (except U
instead of T)
RNA synthesized in
5’ to 3’ direction
with antiparallel DNA
strand as template
Fig.1.11
3’
Alberts Fig.6.4
RNA content of a cell
small regulatory RNAs
snRNAs (small nuclear)
- role in splicing
Fig.1.12
small non-coding (nc)
regulatory RNAs are also
present in bacteria
sRNAs
snoRNA (small nucleolar)
- role in methylation of rRNAs
miRNA (microRNAs) & siRNA (short interfering RNAs)
- role in regulation of expression of individual genes
RNA processing in eukaryotes
- presence of long introns (& short exons) can make
finding genes in eukaryotic DNA sequences difficult
- may be alternative splicing pathways so more than one protein
generated from one gene (Discussed later, Chapter 6)
Fig.1.13
Link between transcriptome & proteome
Mediated by tRNAs
(codon-anticodon)
Genetic code
“standard code”
- can deduce amino acid sequence of protein from nt coding sequence
… using genetic code table
Fig.1.2
See Practice question #1
Fig.1.20
PROTEIN-CODING GENES
DNA
divided into triplets (codons)
5’ …. ATG GGA TTG CCC GCC …. 3’
“coding strand”
3’ .… TAC CCT AAC GGG CGG …. 5’ “template strand”
mRNA 5’ …. AUG GGA UUG CCC GCC …. 3’
- in research papers DNA usually shown as single-stranded
with coding strand in 5’ to 3’ orientation (left to right)
… so genetic code table can be used directly
Amino acid one-letter abbreviation often used instead of 3-letters
Translation
termination
codons
Initiation
codon
Remember that although AUG is the standard initiation codon,
there can also be AUG triplets within an ORF,
… specifying internal Met residues in the protein
And when analyzing DNA data obtained in the lab, initiation codon
might be located outside the sequenced region
Alberts Fig. 6-50
Examples of deviation from the standard genetic code
in mitochondria and microbes
Table 1.3
PROTEIN SEQUENCE
& STRUCTURE
Fig.13.24
Fig.1.17
Different proteins can be generated from single precursor polypeptide
through post-translational events
…so can have larger proteome (set of proteins) than predicted from
number of genes in genome
Cis-acting element:
DNA (or RNA) sequences near a gene, that are
important for its expression
Latin word “cis” means "on the same side as”
Trans-acting factor:
protein (or RNA) that binds to cis-element to
control gene expression
5’
DNA
3’
ATG
TAA
3’
5’
Cis-elements can actually be quite far
away from genes they control in
intergenic spacers (ENCODE project)
and within introns