S1. Effects of Mutations on Proteins – Presentations

Download Report

Transcript S1. Effects of Mutations on Proteins – Presentations

Day 1
Before presentation, insert images here of dry
and wet earwax (e.g. from the URLs in the notes
section)
Day 1
ABCC11 allele frequencies
A allele = recessive, dry earwax
Do you see any geographical pattern in the distribution of the
ABCC11 alleles (white leads to dry earwax; black produces wet)?
Yoshiura et al. (2006) Nat Genet
Multiple alleles of ABCC11: 538G,
538A, and 538C
• At a particular position in the
ABCC11 gene (nucleotide 538), a
G nucleotide (one allele)
stipulates a glycine amino acid
and produces wet earwax
• If, instead, the ABCC11 gene has
an A or a C (two other alleles) at
the same nucleotide position,
then the protein contains an
arginine amino acid and
produces dry earwax
• Which two glycine codons would
become arginine codons if either
an A or a C nucleotide replaced a
G?
Day 1
Day 1
Predicting and classifying effects of
insertion and deletion mutations on
protein coding regions
Day 1
Review – Eukaryotic Translation
1. How do we know where translation of an
mRNA starts?
2. How do we know where translation of an
mRNA ends?
3. What determines the length of a protein?
Day 2
Wormbase Exercise
Goal: use an online database to access DNA
sequences
Because of its simple user interface, we’ll open:
– http://www.wormbase.org
– This is a sequence database for the nematode
Caenorhabditis elegans, a genetic model species
Day 2
Wormbase Exercise
Day 2
Wormbase Exercise
Day 2
Wormbase Exercise
Day 2
Wormbase Exercise
Day 2
Wormbase Exercise
Day 2
Wormbase Exercise
Copy-paste the
highlighted
(uppercase)
DNA sequence
(the CDS) into
a text file
Day 2
Wormbase Exercise
Summary:
You have now seen how to use the
wormbase.org website to identify a gene CDS
Go to wormbase and find a gene you are
interested in. Paste the CDS of your gene into a
text file. Here is one way to find a gene name to
start you off on this activity…
Day 2
Wormbase Exercise
Day 2
Wormbase Exercise
Day 2
Wormbase Exercise: Workflow
Find a gene name
1. Tools > Gbrowse
2. Change the “Region”
field:
Obtain your gene’s CDS
1. Top-right: “Search” (for a
gene): enter your gene
name
– I, II, III, IV, V, X are possible
2. Scroll down to
chromosomes
“Sequences”
– Ranges can generally be
3. Click on the link under
from 1–12,000,000
“CDS” (random letters
– e.g. enter “IV:2000..40000”
and numbers)
3. Select one of the three- 4. Select “spliced+UTR”
letter+one-number (e.g.
5. Copy all of the uppercase
egl-1) gene names you
(shaded) letters and save
see on a gray background
as a text file
in the “Overview” field
Day 2
Initial CDS Analysis Exercise
• Goal: perform initial sequence analyses of
your CDS
• Process
– Form groups of three or four
– Discuss a strategy for, as efficiently as possible,
determining for your CDS:
• How many nucleotides are in the CDS?
• What is the percent of each nucleotide (%A, %G, %T,
%C) comprising the CDS?
– Then answer those two questions for your CDS
Initial CDS Analysis Exercise
Example Data
•
•
•
•
•
•
nuo-1
1,440 nt
A: 380 (26.4%)
T: 356 (24.7%)
G: 385 (26.7%)
C: 319 (22.2%)
Day 2
5 min:
In your group, devise a way
to identify what you would
expect the nucleotide
frequencies to be in a
random sequence of
nucleotides.
Initial CDS Analysis Exercise
Example Data
In 1,440 nt of the nuo-1 CDS, how many A, T, G,
C should we expect to see by chance (or,
randomly)?
Day 2
Initial CDS Analysis Exercise
Example Data
Expected
360:360:360:360
A:T:G:C
Observed
380:356:385:319
Day 2
Day 3
Initial CDS Analysis Exercise:
Chi Square
A T G C
Expected (E) 360:360:360:360
Observed (O) 380:356:385:319
A
T
G
C
Chi square test statistic = (O-E)^2/E + (O-E)^2/E + (O-E)^2/E + (O-E)^2/E
Day 3
Initial CDS Analysis Exercise:
Chi Square
A T G C
Expected (E) 360:360:360:360
Observed (O) 380:356:385:319
A
T
G
C
Chi square test statistic = (O-E)^2/E + (O-E)^2/E + (O-E)^2/E + (O-E)^2/E
(380-360)^2/360 + (356-360)^2/360 + (385-360)^2/360 + (319-360)^2/360
= 7.56
Initial CDS Analysis Exercise:
Chi Square
• Chi square test statistic = 7.56 (related to the
below table of chi square test statistic values)
• To find the associated p value, we first must
determine the number of degrees of freedom
(df) in our calculation
Day 3
Initial CDS Analysis Exercise:
Chi Square
Day 3
• df = # of categories -1
• In our case we have 4 categories (4 nucleotides;
4 clauses in our chi-square calculation)
• df = 4-1 = 3
A
T
G
C
Chi square test statistic = (O-E)^2/E + (O-E)^2/E + (O-E)^2/E + (O-E)^2/E
Initial CDS Analysis Exercise:
Chi Square
Day 3
• Chi square test statistic = 7.56
• Degrees of freedom = # of categories (4) -1 = 3
• Look up the p value range that flanks the chi
square value using the chi square table
7.56
Initial CDS Analysis Exercise:
Chi Square
Day 3
• Chi square test statistic = 7.56
• Degrees of freedom = # of categories (4) -1 = 3
•
p
p
0.1 > chi square > 0.05
7.56
Initial CDS Analysis Exercise:
Chi Square
Day 3
• Chi square test statistic = 7.56
• Degrees of freedom = # of categories (4) -1 = 3
•
p
p
0.1 > chi square > 0.05
7.56
Initial CDS Analysis Exercise:
Chi Square
• For your gene, perform the chi square test to
determine whether it contains the expected
nucleotide composition
• Then compare results and your interpretation
with your group members
• Do all of your group members come to the
same conclusion?
Day 3
Day 3
CDS Format Exercise
In your group
1. Develop a method for determining how many
codons are in your gene’s CDS
ATGGCTTCGGCAGCAGTCTCTCTTGGGGCTAGGCTAATTGGACAGAACGTGCCGAAGGTGGCTGTGCGCGGAGTCG
TGACATCAGCCCAGAATGCGAACGCTCAAGTGAAACAAGAGAAGACATCGTTCGGAAATTTGAAGGACAGTGATCGTATC
TTCACCAATTTGTATGGTCGCCATGATTATCGATTGAAAGGAGCCATGGCTCGTGGAGACTGGCACAAGACGAAGGAAAT
CATTCTAAAAGGATCTGATTGGATTCTCGGAGAACTCAAAACTTCTGGTCTCCGTGGAAGAGGAGGAGCCGGTTTCCCAT
CCGGAATGAAGTGGGGATTCATGAACAAACCATTCGATGGACGTCCGAAATATCTCGTTGTTAACGCTGATGAAGGAGAG
CCAGGAACTTGTAAAGATCGTGAAATTATGCGTCACGACCCACATAAGCTCATCGAAGGATGCTTGATCGGAGGAGTTGC
AATGGGAGCTCGTGCCGCCTATATTTACATCCGTGGAGAGTTCTACAATGAAGCTTGTATTCTTCAAGAGGCCATCAACG
AGGCCTACAAGGCCGGATATCTCGGAAAAGACTGCCTCGGAACTGGATACAACTTTGACGTCTTCGTTCATCGTGGAGCT
GGAGCCTACATTTGCGGTGAAGAGACTGCTCTTATTGAATCGCTCGAAGGAAAGCAAGGAAAACCACGTCTGAAGCCTCC
ATTCCCAGCTGACATTGGACTTTTCGGTTGTCCAACAACTGTGACCAATGTTGAAACTGTAGCTGTGGCTCCAACAATTT
GTCGTCGTGGTGGTGACTGGTTCGCTTCGTTTGGACGTGAGAGAAATCGCGGAACCAAGCTTTTCTGCATCTCCGGTCAA
GTCAACAATCCATGCACGGTCGAAGAAGAGATGTCCGTTCCATTGAAAGATCTTATCGAGCGTCATTGTGGTGGAGTCAT
CGGTGGATGGGATAACCTTCTTGCCATCATTCCAGGAGGATCTTCTGTCCCACTTATGCCAAAGAACGTTTGCGACACAG
TTCTCATGGATTTCGATGCCCTGGTTGCTGCTCAATCTGGTCTTGGAACCGCCGCGGTTATTGTGATGAACAAGCAAACT
GATATCGTTAAATGCATCGCCCGATTGTCACTTTTCTATAAACACGAATCTTGCGGACAGTGTACTCCATGCCGTGAGGG
ATGCAATTGGTTGAACAAGATGATGTGGCGTTTCGTTGATGGAAAAGCCAAGCCATCGGAGATCGATATGATGTGGGAAC
TTAGTAAACAGATCGAGGGACACACTATTTGTGCTCTTGGAGACGCTGCTGCATGGCCAGTTCAGGGACTCATTCGCCAC
TTCCGACCAGAACTCGAACGCCGAATGGCCGAGTTCCACAAGCAAGTTTTGGCCGAGCAGGGAGCCAAGCAGATCAGTCA
ATGA
Day 3
CDS Format Exercise
In your group
1. Develop a method for determining how many
codons are in your gene’s CDS
2. Where in a CDS is the start codon?
3. The stop codon?
Day 3
CDS Format Exercise
Where in a CDS is the start codon? The stop codon?
ATGGCTTCGGCAGCAGTCTCTCTTGGGGCTAGGCTAATTGGACAGAACGTGCCGAAGGTGGCTGTGCGCGGAGTCG
TGACATCAGCCCAGAATGCGAACGCTCAAGTGAAACAAGAGAAGACATCGTTCGGAAATTTGAAGGACAGTGATCGTATC
TTCACCAATTTGTATGGTCGCCATGATTATCGATTGAAAGGAGCCATGGCTCGTGGAGACTGGCACAAGACGAAGGAAAT
CATTCTAAAAGGATCTGATTGGATTCTCGGAGAACTCAAAACTTCTGGTCTCCGTGGAAGAGGAGGAGCCGGTTTCCCAT
CCGGAATGAAGTGGGGATTCATGAACAAACCATTCGATGGACGTCCGAAATATCTCGTTGTTAACGCTGATGAAGGAGAG
CCAGGAACTTGTAAAGATCGTGAAATTATGCGTCACGACCCACATAAGCTCATCGAAGGATGCTTGATCGGAGGAGTTGC
AATGGGAGCTCGTGCCGCCTATATTTACATCCGTGGAGAGTTCTACAATGAAGCTTGTATTCTTCAAGAGGCCATCAACG
AGGCCTACAAGGCCGGATATCTCGGAAAAGACTGCCTCGGAACTGGATACAACTTTGACGTCTTCGTTCATCGTGGAGCT
GGAGCCTACATTTGCGGTGAAGAGACTGCTCTTATTGAATCGCTCGAAGGAAAGCAAGGAAAACCACGTCTGAAGCCTCC
ATTCCCAGCTGACATTGGACTTTTCGGTTGTCCAACAACTGTGACCAATGTTGAAACTGTAGCTGTGGCTCCAACAATTT
GTCGTCGTGGTGGTGACTGGTTCGCTTCGTTTGGACGTGAGAGAAATCGCGGAACCAAGCTTTTCTGCATCTCCGGTCAA
GTCAACAATCCATGCACGGTCGAAGAAGAGATGTCCGTTCCATTGAAAGATCTTATCGAGCGTCATTGTGGTGGAGTCAT
CGGTGGATGGGATAACCTTCTTGCCATCATTCCAGGAGGATCTTCTGTCCCACTTATGCCAAAGAACGTTTGCGACACAG
TTCTCATGGATTTCGATGCCCTGGTTGCTGCTCAATCTGGTCTTGGAACCGCCGCGGTTATTGTGATGAACAAGCAAACT
GATATCGTTAAATGCATCGCCCGATTGTCACTTTTCTATAAACACGAATCTTGCGGACAGTGTACTCCATGCCGTGAGGG
ATGCAATTGGTTGAACAAGATGATGTGGCGTTTCGTTGATGGAAAAGCCAAGCCATCGGAGATCGATATGATGTGGGAAC
TTAGTAAACAGATCGAGGGACACACTATTTGTGCTCTTGGAGACGCTGCTGCATGGCCAGTTCAGGGACTCATTCGCCAC
TTCCGACCAGAACTCGAACGCCGAATGGCCGAGTTCCACAAGCAAGTTTTGGCCGAGCAGGGAGCCAAGCAGATCAGTCA
ATGA
Does your CDS follow this pattern?
Day 3
CDS Translation Exercise
Empirically determining the number of sense (amino
acid-encoding) codons:
How many amino acids comprise the translated CDS?
http://www.ebi.ac.uk/Tools/st/emboss_transeq/
Day 3
CDS Translation Exercise (Example)
Day 3
CDS Translation Exercise (Example)
Day 3
CDS Translation Exercise (Example)
Results for nuo-1
• Calculated: 1,440 nt CDS / 3 nt/codon – 1 stop codon = 479
sense (amino acid-encoding) codons
• The translated protein below contains 479 amino acids
Day 3
CDS Translation Exercise
• Perform a computational translation of your
CDS
• Copy and paste the translated sequence into a
text file
• How many amino acids are in the protein?
Does this match your calculation of the
number of sense (amino acid-encoding)
codons?
CDS Translation Exercise:
Frequency of Stop Codons
Day 3
NUO-1 amino acid sequence
MASAAVSLGARLIGQNVPKVAVRGVVTSAQNANAQVKQEKTSFGNLKDSDRIFTNLYGRH
DYRLKGAMARGDWHKTKEIILKGSDWILGELKTSGLRGRGGAGFPSGMKWGFMNKPFDGR
PKYLVVNADEGEPGTCKDREIMRHDPHKLIEGCLIGGVAMGARAAYIYIRGEFYNEACIL
QEAINEAYKAGYLGKDCLGTGYNFDVFVHRGAGAYICGEETALIESLEGKQGKPRLKPPF
PADIGLFGCPTTVTNVETVAVAPTICRRGGDWFASFGRERNRGTKLFCISGQVNNPCTVE
EEMSVPLKDLIERHCGGVIGGWDNLLAIIPGGSSVPLMPKNVCDTVLMDFDALVAAQSGL
GTAAVIVMNKQTDIVKCIARLSLFYKHESCGQCTPCREGCNWLNKMMWRFVDGKAKPSEI
DMMWELSKQIEGHTICALGDAAAWPVQGLIRHFRPELERRMAEFHKQVLAEQGAKQISQ*
Question
Does the protein have the number of amino acids we’d predict it
to have, just by chance?
CDS Translation Exercise:
Frequency of Stop Codons
NUO-1 amino acid sequence
MASAAVSLGARLIGQNVPKVAVRGVVTSAQNANAQVKQEKTSFGNLKDSDRIFTNLYGRH
DYRLKGAMARGDWHKTKEIILKGSDWILGELKTSGLRGRGGAGFPSGMKWGFMNKPFDGR
PKYLVVNADEGEPGTCKDREIMRHDPHKLIEGCLIGGVAMGARAAYIYIRGEFYNEACIL
QEAINEAYKAGYLGKDCLGTGYNFDVFVHRGAGAYICGEETALIESLEGKQGKPRLKPPF
PADIGLFGCPTTVTNVETVAVAPTICRRGGDWFASFGRERNRGTKLFCISGQVNNPCTVE
EEMSVPLKDLIERHCGGVIGGWDNLLAIIPGGSSVPLMPKNVCDTVLMDFDALVAAQSGL
GTAAVIVMNKQTDIVKCIARLSLFYKHESCGQCTPCREGCNWLNKMMWRFVDGKAKPSEI
DMMWELSKQIEGHTICALGDAAAWPVQGLIRHFRPELERRMAEFHKQVLAEQGAKQISQ*
Observations
1/480 codons is a stop (nonsense) codon (*)
479/480 codons are sense (amino-acid-encoding) codons
What do we expect?
Day 4
CDS Translation Exercise:
Frequency of Stop Codons
What frequency of
stop codons (“End”
in the table, right) do
we expect to find in
a DNA sequence?
Discuss in your
group and make a
prediction (5 mins.)
Day 4
CDS Translation Exercise:
Frequency of Stop Codons
Because 3/64
codons are stop
codons, we should
see a stop codon, by
chance, once in
every ~21 codons
Day 4
CDS Translation Exercise:
Frequency of Stop Codons
Because 3/64
codons are stop
codons, we should
see a stop codon, by
chance, once in
every ~21 codons
How do we convert
this expectation into
something we can
employ in the chi
square test?
Observation for nuo-1
1/480 codons is a stop (nonsense) codon
479/480 codons are sense (amino-acidencoding) codons
Expectation
?
Day 4
CDS Translation Exercise:
Frequency of Stop Codons
Chi square = (1-22.5)^2/22.5 + (479-457.5)^2/457.5 = 21.99
df = 1
p = 0.0000034
Therefore, there are significantly fewer stop codons in the
nuo-1 CDS than we would expect by chance
Observation
1 stop codon
479 sense codons
Expectation
22.5 stop codons
457.5 sense codons
Day 4
CDS Translation Exercise:
Frequency of Stop Codons
Perform a statistical analysis of the frequency of stop codons
in your coding region and then compare results with your
group
Day 4
CDS Translation Exercise:
Frequency of Stop Codons
Take-home messages
• Stop codons are usually underrepresented in coding
regions
• Stop codons in shorter proteins will be relatively more
common than in longer proteins
Maybe the proteins we’ve happened to be looking at are
particularly long. What’s the normal length of a protein?
Make a prediction: what would you expect, based on what
you know so far, that the average length of a protein is?
Day 4
CDS Translation Exercise:
Frequency of Stop Codons
Day 4
Most frequent protein length in the
Swissprot database is ~150 aa
Proteins with length =< 21 aa
Nuel (2006) Algor. Mol. Biol. 1:5
CreativeCommons License
Conclusions
• Most proteins are much bigger than 21 aa
• Stop codons really are underrepresented in coding regions
Day 5
Mutation Exercise
Time to mutate your CDS!
The rules:
1) Either insert new or delete existing nucleotides
2) Either write out a 1–12 nucleotide sequence to
insert, or choose 1–12 consecutive nucleotides to
delete
Make the change!
Day 5
Mutation Exercise
On your own, make predictions:
Will the mutation you made to your CDS:
• Result in a mutant protein that is:
a) Longer than the original,
b) shorter, or
c) the same length?
• Result in a mutant protein that has:
1) the same amino acid sequence as the original, or
2) different sequence (severely, somewhat, or barely)?
Day 5
Mutation Exercise
Time to create new groups!
1, 4, 7 or 10 nt
Insertion
I1
Deletion
D1
2, 5, 8 or 11 nt
I2
D2
3, 6, 9, or 12 nt
I3
D3
In your group, is there a consensus about your predictions?
Each group should select one member to report out
Day 5
Mutation Exercise
Summary of Predictions
Protein Length
Group
Increases
Same
Protein Sequence Changes?
Decreases
Yes: severely
Yes: barely
No
I1
I2
I3
D1
D2
D3
Explain why your group reached the consensus it did
Quantifying the Effects of Mutation
Exercise
Use transeq to translate your mutant CDS
Day 6
Quantifying the Effects of Mutation
Exercise
As a group, find a common way to measure the effect
of your mutation on:
i) the amino acid sequence (is the same, or is it
different?)
ii) the presence (frequency) of premature termination
codons (PTCs)
iii) the length of the encoded protein
Day 6
Quantifying the Effects of Mutation
Exercise
Develop a group consensus on:
• How the proteins changed
• What was similar about your mutant proteins
• A brief explanation of why those results are shared
among the group members
• Select one member to report out
Day 6
Quantifying the Effects of Mutation
Exercise
Summary of Observed Effects
Protein Length
Group
I1
I2
I3
D1
D2
D3
Increases
Same
Protein Sequence Changes?
Decreases
Yes: severely
Yes: barely
No
Day 6
Quantifying the Effects of Mutation
Exercise
Day 7
Summary of Likely Observed Effects (Key)
Group
I1
I2
I3
D1
D2
Increases
I1a
I2a
I3a, I3c
D1a
D2a
D3
D3e
Notes
Protein Length
Same
I3b, I3e
Protein Sequence Changes?
Decreases Yes: severely Yes: barely
No
I1b
I1a, I1b
I2b
I2a, I2b
I3d
I3a, I3c, I3d I3b
I3e
D1b
D1a, D1b
D2b
D2a, D2b
D3a, D3b, D3b, D3c,
D3c, D3d D3d, D3e
D3a
whether the length changes depends on large It is very unlikely, but possible, that an
part on the balance of:
insertion or deletion in a coding region
a) how many nucleotides are added/removed could result in no protein sequence
by the student
change.
b) The position in the CDS where the mutation N.B. that I3e is technically not an
occurred (toward the 5’ end is more likely insertion in the coding region, however,
disruptive than toward the 3’ end)
because the stop codon is noncoding
c) the distance to the next stop codon
Example mutations for each the six groups have been provided as a discussion tool, to demonstrate
the variety of effects each type of mutation could potentially produce. For each group, different
mutations have been assigned a serial letter for ease of reference.
Day 7
Reversing the Process Exercise
Create a mutation that would populate one empty cell
from your class summary table
Protein Length
Group
Increases
Same
Protein Sequence Changes?
Decreases
Yes: severely
Yes: barely
No
I1
I2
I3
D1
D2
D3
Is it possible that every outcome in this table could occur?
Challenge: for next class, identify an indel mutation that
results in no protein sequence change
Day 7
General Conclusions
Impacts of mutations on protein coding sequences:
• Frameshifts (insertions or deletions of one or two
nucleotides) often reveal premature termination
codons, thus shortening the protein, and also change
the downstream amino acid sequence of the protein
• All of the types of mutation we worked with can either
increase or decrease protein length (or leave the same
length) and can change the protein sequence
We used statistics, comparing expected and observed
values, to detect differences
Day 7
Extension
With what we learned in this module
1. What process could you use to find CDSs if
given a large DNA sequence that contains
genes?
2. What feature could help you know it is a
CDS?