PPT - Bioinformatics.ca

Download Report

Transcript PPT - Bioinformatics.ca

Lab 4.3: Molecular Evolution
Jennifer Gardy
Molecular Biology & Biochemistry
Simon Fraser University
Lab 4.3
1
http://creativecommons.org/licenses/by-sa/2.0/
Lab 4.3
2
Goals
• Perform an MSA for a family of related proteins
• Create & view a phylogenetic tree showing the
relationships between these proteins
• Use the tree to determine the evolutionary
history of the genes/proteins
• Complete the phylogeny assignment
Lab 4.3
3
Outline
• Research Question
• Creating a tree: ClustalX
– An edited alignment
– How Neighbour-Joining works
– Bootstrapping
• Viewing a tree:NJPlot
Lab 4.3
4
A Handy Tip
• Glutamine = Gln
• Glutamic Acid = Glu
Lab 4.3
5
Research Question
• Protein translation involves
tRNAs
• A tRNA has an anticodon
that recognizes a specific
codon on mRNA, and is
“charged” with a particular
amino acid
Lab 4.3
6
Research Question
• How is a tRNA with a particular anticodon charged with
the correct amino acid? A tRNA synthetase
• tRNA = red, tRNA synthetase = blue
• aaRS grips the anticodon (for recognition) and then
adds the correct amino acid at the active site
Lab 4.3
7
Research Question
• There are 20 amino acids, therefore there
ought to be 20 tRNA synethetases, no?
No!
• Many bacterial genomes do not contain a
Glutaminyl-tRNA-synethetase (GlnRS)
So how do bacteria charge their tRNA
with Glutamine?!?!?!
Lab 4.3
8
Research Question
• These bacteria mis-acylate their GlnRS with
Glutamic Acid (Glu)
• The GatABC enzyme complex then converts
the Glutamic Acid to Glutamine
GatABC
Glu
Gln
CAA
CAG
CAA
CAG
Lab 4.3
9
Research Question
• However, some bacteria, like E. coli, have a
functioning GlnRS
• Bacterial GlnRS (rarer) and GluRS (more
common) are derived from a common
ancestor
What is the evolutionary history of the
bacterial GlnRS and GluRS genes?
Lab 4.3
10
3 Hypotheses
1. Gene loss
•
The common ancestor of life on earth had a
GlnRS gene, which was subsequently lost in
most bacteria
2. Gene duplication
•
GlnRS evolved independently in some bacteria
after a GluRS underwent a gene duplication
event in a bacterial ancestor
3. Horizontal gene transfer
•
Lab 4.3
The GlnRS gene was introduced into a bacteria
from another lineage
11
3 Hypotheses
• Each hypothesis can be described by a
unique phylogenetic tree
• We will build a tree, and you will determine
which hypothesis it best supports
Lab 4.3
12
Step 1: The MSA
• MSAs are the basis of phylogenetic analyses,
therefore… make sure it’s a good one!
• Select appropriate
sequences:
– 11 GlnRS and GluRS
sequences
• Human, fly, nematode, yeast, E.
coli, archaea
Lab 4.3
13
Step 1: The MSA
• Edit the sequences to remove “uninformative”
regions
– N-terminal signal peptides, frayed ends
– Non-homologous regions
– Poorly aligned regions
• For enzymes (like GlnRS and GluRS), often
best to use only the catalytic core
Lab 4.3
14
Step 1: A 30 second MSA
- Day 4 website
- tRNAsynthetases.txt
- ClustalX with default parameters
- Leave it open on your screen
Lab 4.3
15
Our Alignment
?
Lab 4.3
Outgroup: Slightly similar protein
descended from the same long-longlong ago common ancestor, but not a
member of the group of interest. Used
to root tree.
16
Step 2: Drawing a Tree
Neighbour-Joining
• Distance matrix method
– % difference between all
pairwise combinations of
sequences measured
– Distances assembled into
one tree
• Works better for related
sequences
• Fast
• Commonly used: THIS IS A
BAD REASON but at least
you will not feel lonely if you
use it!
Lab 4.3
Maximum Parsimony
• Discrete data method
– Creates all possible trees
that describe the sequences
– Looks at each column in the
MSA to count up # of
evolutionary change events
– Picks the simplest tree that
describes the events
observed
• Works better for highly
divergent sequences
• Slower
• Allows you to trace the
evolution of specific sites
within the molecule
17
Building an NJ Tree - An Example
Cbw protein from cat, rat, bat and Matt
1. Compare all sequences
to each other.
2. Assign divergence
values to each pair
3. Assemble the values in
a distance matrix
Cat
Rat
Bat
Cat
-
Rat
0.7
-
Bat
0.8
0.2
-
Matt 0.6
0.4
0.5
Lab 4.3
Matt
18
Building an NJ Tree
4. Arrange the subjects in a “star” phylogeny
Lab 4.3
19
Building an NJ Tree
5. Fuse the two subjects with the least divergence between them
Cat Rat
Cat
-
Rat
0.7 -
Bat
0.8 0.2
Matt 0.6 0.4
Bat
Matt
0.5
-
6. Create a new distance matrix, replacing Rat and Bat with the fusion of the two,
repeat step 4…
Cat
RatBat
Cat
-
RatBat
0.75
-
Matt
0.6
0.45
Lab 4.3
Matt
-
20
A Word about Bootstrapping
• 1001 definitions, none of which have to do
with boots. Or straps.
– http://en.wikipedia.org/wiki/Bootstrapping
• In phylogenetic analysis, bootstrapping is a
simple test of phylogenetic accuracy
Does my whole dataset strongly support my
tree? Or was this tree just marginally better than
the other alternatives?
Lab 4.3
21
Bootstrapping – The Wordy Version
• Original dataset is “randomly sampled with
replacement”
• Multiple (N=100, 1000, etc…) “pseudodatasets” of the same size as the original are
created
• Each of the N pseudo-datasets is used to
create a tree
• If a specific branching order is found in X of
the N trees, that node is given the bootstrap
support value X
• X values of 70% or more = very reliable
groupings
Lab 4.3
22
Bootstrapping – The Picture Version
1. Slice original MSA of Y residues into Y columns, put the columns into a hat
2. Pull out a random column, place it in column #1 of your new test set
“random sampling”
3. Put the column back in the hat
“with replacement”
4. Pull another column from the hat, place it in column #2 in the test set, put it back
5. Repeat until a pseudo-dataset of Y columns has been made
Lab 4.3
23
Bootstrapping
– Repeat N number of times to generate N pseudo-datasets
– For each pseudo-dataset, draw a tree (yields N trees)
– Compare your tree to all N trees. How often do the branching orders in
your trees appear in the N pseudo-trees?
- On branch of your tree, write # of
times that branch appeared in your
test set
Rat and Bat branch together in 2
of the 3 pseudo-trees
Lab 4.3
2
2
24
Step 2. Our Neighbour-Joining Tree
• Trees >
– “Exclude positions w/ gaps”
• Any column with 1+ gaps
• Deletes uninformative regions
• Not good for gappy MSAs
– “Correct for multiple subs.”
• M -> V -> L -> V
• 3, not 1, mutations
• Correction formula makes
distances proportional to time
since divergence
• Trees > Output Format Opt
– “node”, not “branch”
Lab 4.3
25
Draw an Bootstrapped NJ Tree
• Trees > Bootstrap NJ Tree
• What does this file look like?
(
(
GluRS_Human:0.06092,
GluRS_Nematode:0.06188)
:0.01158,
GluRS_FruitFly:0.06737,
(
GluRS_Yeast:0.09092,
(
(
(
GluRS_E.coli:0.23596,
GluRS_Methanococcus:0.09737)
:0.06384,
TrpRS_Geobacillus:0.65546)
:0.10513,
(
(
(
GlnRS_Human:0.01754,
(
GlnRS_FruitFly:0.01151,
GlnRS_Nematode:0.02357)
:0.00877)
:0.04642,
GlnRS_Yeast:0.05007)
:0.02215,
GlnRS_E.coli:0.06338)
:0.05058)
:0.02915)
:0.01926);
Lab 4.3
Not very tree-like
26
View the Tree with NJPlot
• NJPlot – very basic
• njplot at prompt
• Turn on bootstrap value
display
• Can swap nodes
• Many other tree viewing
programs available
– Treeview, 3 different views:
Lab 4.3
27
Remainder of lab time/Evening open lab
• Play with the NJPlot options
– until you like the look of your tree, can clearly see the branching order,
and bootstrap labels are displayed
• Print your tree (Q3 of phylogeny assignment)
• Complete the phylogeny assignment!
– Q4: Bootstrapping
– Q5 & 6: Which hypothesis does your tree support? (slide 11)
•
•
•
•
With your team, sketch the trees you’d expect from the different hypotheses
Fiona’s slides show some examples of different evolutionary scenarios
Use your resource sites! NCBI Bookshelf, Google, etc..
Attention biologists! Karma alert! Help your teammates to understand
evolution today, and they’ll help you understand programming tomorrow!
• Module 3 of the Integrated Assignment
– Again, there is LOTS of time available for the IA – you don’t need to
finish it tonight.
– Won’t begin Module 4 until Monday
Lab 4.3
28