MBoMS Genomics of Model Microbes Lab 3: Tools for
Download
Report
Transcript MBoMS Genomics of Model Microbes Lab 3: Tools for
MBoMS
Genomics of Model Microbes
Lab 5: Recap of Taxplot and Alignments
Comparing Microbial
Genomes
• Last time you learned how to find specific
genes in a target microbial genome
– Often, you will be less interested in a specific
gene or protein and more interested in making
more sweeping comparisons between genomes
• The exercised in this lab will teach you how
to employ several NCBI-based genome
comparison tools
Exercise 1
• Go to the Microbial Genome
Resource Page
• Find the tool box on the right
edge
• Click on TaxPlot
–Use help to learn a bit about
TaxPlot
Taxplot
• What is taxplot?
– A three-way genome comparison tool based on pre-computed protein BLASTP
E-values
– It displays a point for each protein in the reference genome based on the best
alignment with proteins in each of the two genomes being compared
• What is new about taxplot
– It employs the BLAST Score Ratio (BSR) approach, which classifies all putative
peptides within three genomes using a measure of similarity based on the ratio of
BLAST scores
– BSR analysis is a departure from traditional genome scale analyses as it
overcomes the limitations of BLAST E-values in comparative studies by
normalizing the BLAST raw scores.
• What does taxplot provide?
– The output of the BSR analysis enables global visualization of the degree of
proteome similarity between all three genomes
– Additional output enables the genomic synteny (conserved gene order) between
each genome pair to be assessed
– The synteny analysis is overlain with BSR data as a color dimension, enabling
visualization of the degree of similarity of the peptides being compared
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Exercise 2
• Now, use TaxPlot to compare
multiple genomes from each of
your two species
–In TaxPlot, choose 3 genomes for one
of your species
• you can scroll through the options at
the top of the page to find all available
genomes for each species
• Let TaxPlot calculate the relationships
between the 3 genomes
–Repeat for the second species
Sample TaxPlot:
E. coli 101 vs E. coli K12 vs E. coli 0157:H7
523 hits
K12 and
0157
Share a
large
Number of
proteins
That are
equally
similar to
101
1002 equal hits
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
K12 and 101
share the
greatest
similarity
2120 hits
4238 query proteins produced 3645 hits
Sample
TaxPlot Results
• 4238 proteins in 101 were
compared in a 101xK12x0157 three
way comparison
– 1002 of the comparisons were
equivalent for K12 and 0157
– 2120 of the comparisons had “better”
scores for K12
– 523 of the comparisons had “better”
scores for 0157
• You can click on any of the circles
to get details of the specific gene(s)
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Exercise 2, cont.
• Go to the Tax Plot results for each of your
two species
– Compare and contrast the results for the two
species
– Are the genomes from one species more or
less similar than the genomes for the second
species?
– Do the plots show high levels of synteny (the
genes are in the same order or same place in
the genomes)?
Ec1 x Ec2 x Ec3
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Ec1 x Ec2 x Se1
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Ec1 x Ec2 x Bc1
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Ec1 x Ec2 x Ba1
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Table 1. Taxplot percent better scores
ec1
ec2
ec3
se1
se2
se3
ec1
100
75
85
62
62
61
ec2
38
100
36
63
63
63
ec3
83
73
100
76
75
76
se1
38
38
38
100
30
11
se2
71
71
71
80
100
94
se3
50
60
61
71
77
100
Table 2. Taxplot raw data
t axa1
ec1
ec2
ec3
se1
se2
se3
ec1
ec1
ec2
ec2
ec3
ec3
se1
se1
se2
se2
se3
se3
t axa2
ec2
ec1
ec1
se2
se1
se1
se1
se3
se1
se3
se1
se3
ec1
ec3
ec1
ec3
ec1
ec3
t axa3
ec3
ec3
ec2
se3
se3
se2
se2
se2
se2
ec2
ec2
ec2
query
4238
4629
4783
4510
5604
5386
4238
4238
4629
4629
4783
4783
4510
4510
5604
5604
5386
5386
hit s
equal
3823
786
3981 1906
4005
834
3011 1523
3273
231
3828
202
3410
303
2944
330
3637
321
3173
361
3556
322
3029
337
3281
845
3305 1802
3630
903
3741 1921
3572
873
3624 1901
above
652
1089
2581
876
763
215
981
1141
1059
1260
1031
1177
1205
1231
1337
919
1308
883
below
2385
986
590
612
2279
3411
2126
1472
2257
1552
2203
1515
1231
726
1390
901
1391
840
Table 3. Taxplot pairwise percent best scores
ec within
%best
se within
ec 1
ec 1
ec 2
ec 2
ec 3
ec 3
ec 2
ec 3
ec 1
ec 3
ec 1
ec 2
652
2385
1089
986
2581
590
21
79
52
48
81
19
s e1
s e1
s e2
s e2
s e3
s e3
s e2
s e3
s e1
s e3
s e1
s e2
876
612
763
2279
215
3411
786 + 652
3823
%best
59
41
25
75
6
94
X
100
What other ways could we
visualize these data?
100 % Proteome similarity
Within Ec
Ec
Between Ec x Se
Within Se
Between Se x Ec
Se
100 % Proteome similarity
DUE NEXT LAB
Taxplot results
• What did taxplot tell you about the comparison
of genomes within your two species?
• What did taxplot tell you about the comparison
of genomes between your two species?
• How can we use these results to refine our
study?
Put in lab notebook
FROM LAST TIME
Alignments
• You should have with you alignments produced by
CLUSTALW for each of your proteins
– Was CLUSTALW straightforward to use?
• If not, why?
– Did you have problems entering your data?
• If yes, how did you solve them?
– Did adjusting the gap and extension penalties help to
improve your alignments?
• If yes, which proteins?
– Do you feel your alignments are the best they can be?
• If yes, why?
• If not, what should you do?
FROM LAST TIME
Alignments
• Let’s look at your 12 alignment files
– Print them out if you have not already done so
• How well does each protein align within one
species?
• How does the alignment change when
compared between two species?
• Can we compare any of these proteins
between all of our species (~30 species)?
FROM LAST TIME
Alignments
• Please make sure that Peg or Michelle looks
at your alignments and helps you to make
them as robust as is possible
– In some cases, we may choose to exclude a
protein from analysis
– In some cases, we may urge you to delete a taxa
– In some cases, we may urge you to try more gap
and/or extension penalty values
• The goal for today is to finalize our
alignments
FROM LAST TIME
Alignments
• Your initial alignments are due today in
class, please print them out and hand
them in (they probably won’t be perfect,
that is okay, we will work on them in
class today)
• Your finalized alignments are due on
Thursday in class, please print them out
and hand them in on Thursday