on Translation

Download Report

Transcript on Translation

More on translation
How DNA codes proteins
The primary structure of each protein (the sequence of
amino acids in the polypeptide chains that make up a
protein) is represented as a sequence of codons. Each
codon is a triplet of nucleotides. Traditionally, codons are
written in the A, G, C, U alphabet of RNA. Most organisms
use the same genetic code, i.e., correspondence between
codons and amino acids.
How DNA codes proteins
There are 64 potential codons.
Most amino acids can be coded by more than one codon.
Three of the codons are STOP codons. They signal the end
of a coding sequence. One codon (AUG) doubles as a
START, or initiation codon that signals the beginning of a
coding sequence.
The genetic code
First
position
A
C
U
Gly
Gly
Gly
Gly
Glu
Glu
Asp
Asp
Ala
Ala
Ala
Ala
Val
Val
Val
Val
G
A
C
U
A
Arg
Arg
Ser
Ser
Lys
Lys
Asn
Asn
Thr
Thr
Thr
Thr
Met
Ile
Ile
Ile
G
A
C
U
C
Arg
Arg
Arg
Arg
Gln
Gln
His
His
Pro
Pro
Pro
Pro
Leu
Leu
Leu
Leu
G
A
C
U
U
Trp
STOP
Cys
Cys
STOP
STOP
Tyr
Tyr
Ser
Ser
Ser
Ser
Leu
Leu
Phe
Phe
G
A
C
U
G
G
Third
position
URL of the day
Codon Usage Database
http://www.kazusa.or.jp/codon/
The frequency of codon usage in various organisms may
be searched at this site.
Homework 2.1
 Look up the codon usage table for the following
organisms:
- Mycoplasma gallisepticum R
- Haemophilus influenzae PittGG
- Homo sapiens (nuclear genome)
- Homo sapiens (mitochondrial genome)
and use it to calculate the average length of a
polypeptide chain in this organism. Be sure to explain
the ideas behind your calculations.
Reading frames
Suppose we are given the following nucleotide sequence:
5’ … AAGTTTCCGAGCTGACGGGACT… 3’
Provided this is a protein coding region, which amino acids
does it code? Let us see how we could parse the above
sequence.
5’ … AAG TTT CCG AGC TGA CGG GAC T… 3’
which translates into:
... Lys Phe Pro Ser STOP
Reading frames
Alternatively, we could have:
5’ … A AGT TTC CGA GCT GAC GGG ACT … 3’
which translates into:
... Ser Phe Arg Ala Asp Gly Thr …
or:
5’ … AA GTT TCC GAG CTG ACG GGA CT … 3’
which translates into:
… Val Ser Glu Leu Thr Gly ...
More reading frames
But what if the coding is done by the other strand? The
reverse complement to
5’ … AAGTTTCCGAGCTGACGGGACT… 3’ is:
5’ … AGTCCCGTCAGCTCGGAAACTT … 3’
This can be parsed as:
5’ … AGT CCC GTC AGC TCG GAA ACT T … 3’
which translates into:
... Ser Pro Val Ser Ser Glu Ala …
More reading frames
Alternatively, we could have:
5’ … A GTC CCG TCA GCT CGG AAA CTT … 3’
which translates into:
... Val Pro Ser Ala Arg Lys Leu …
or:
5’ … AG TCC CGT CAG CTC GGA AAC TT … 3’
which translates into:
… Ser Arg Glu Leu Gly Thr ...
There are six different reading
frames
We have seen that each stretch of coding region could be
translated in six different ways into amino acid sequences.
These six different ways of parsing a coding sequence are
called reading frames. If we search the genome for coding
regions of genes, all six reading frames
have to be considered.
Open reading frames (ORF’s)
If an organism does not have introns, each reasonably
long stretch between an initiation codon and a STOP codon
(called an open reading frame, or ORF) is potentially the
coding region for a polypeptide chain. (But note that the
initiation codon could also be the codon for methionine!)
In organisms containing introns it is much more difficult to
recognize coding regions. On top of being noncoding
themselves, introns may also introduce frame shifts.
Homework 2.2
Suppose the nucleotide sequence
5’ … AGGCTTCAAGGGT … 3’
is part of a coding region.
Find its translation in all six reading frames.