Transcript Slide 1

Integration of Tools
Summary
The tour How to cope with overwhelming information
described how difficult it sometimes is to get tools of
genome analysis to work together. The present tour
shows that the task is certainly not impossible.
PhAnToMe / BioBIKE offers a common interface in
which the results of tool may be used as input to the
next.
In this example, a set of proteins defined by the results
of a Blast search are aligned, and the alignment is
used to make a phylogenetic tree.
This is best viewed as a slide show.
Click
To view it, click Slide Show
onto
thestart
top tool bar, then View show.
Integration of Tools
• How to get to PhAnToMe / BioBIKE
• Problem: Find, characterize rII-like proteins
Slide #
4– 8
9 – 64
• Examine bacteriophage T4 genome
10 – 18
• Define set of proteins similar to rII, per Blast
19 – 37
• Align rII-like proteins
38 – 51
• Make phylogenetic tree of rII-like proteins
52 – 64
• Reflections and coming attractions
65
To navigate to a specific slide, type the slide number and press Enter
(works only within a Slide Show)
Integration of Tools
There are more tools useful in studying genomes than
anyone would care to learn. It is often advantageous
to combine tools, but this is often difficult.
This problem is illustrated in the tour:
How to cope with overwhelming information?
PhAnToMe/BioBIKE attempts to remove logistical
barriers in combining tools, as illustrated in this tour.
Blast
Clustal
Phylip
www.phantome.org
PhAnToMe/BioBIKE can be accessed by going
to the PhAnToMe web site at www.phantome.org
and mousing over the Tools menu.
Be sure you are using Firefox. BioBIKE will not function with other browsers.
Then click The Phage BioBIKE
Enter your e-mail address
and click New Login
The first time you log in, you'll be asked
for identifying information. This is so that
any changes you make in the database are
associated with you.
After filling in the fields, click Register.
An alternate route is through the BioBIKE portal at http://biobike.csbc.vcu.edu
However you get to BioBIKE, this is what you’d see.
Now suppose that your goal is to characterize protein
similar to the rII protein of bacteriophage T4 (if you’ve
never heard of this protein, no matter). Specifically:
- Find such proteins
- Align them
- Make a phylogenetic tree
First, let’s take a look at phage T4.
To do that, mouse over
the Genome button…
…and click SEQUENCE-OF.
The SEQUENCE-OF function
appears in the workspace.
This function displays/returns
the sequence of a gene, protein,
genome, contig, replicon, or any
arbitrary sequence you provide.
To tell the function which
sequence you want to see,
click the entity box,
selecting it for entry.
The entity box turns white and a
cursor appears. You can type in the
box, but unless you know the exact
name of the phage, it's easier to pull
the name off a menu.
We want an organism (which
is how BioBIKE considers
phages), so mouse over the
Organisms button…
…mouse over the
bacteriophage menu.
Scroll through the menu
until you find phage T4.
Note that the phages are arranged
alphabetically by their host.
Click T4 to bring it into the
SEQUENCE-OF function.
Now the function is complete
(no open white boxes).
Mouse over the function’s
action icon (the green wedge
in the upper left corner)…
…and click Execute.
Colored gene sequences are
presented within the context of
the genome and its annotation.
You can scroll through the
genome, or search for specific
genes ore sequences, but for
now, just X out of the
sequence viewer.
(but first note or copy the name
of the rIIA gene, T4p001)
Problem
- Find such proteins
- Align them
- Make a phylogenetic tree
That was interesting, but...
What was the problem again?
OK. First step, find proteins with similar
sequences to T4P001. To do this, mouse
over the Strings-Sequences button…
…and click SEQUENCE-SIMILAR-TO
SEQUENCE-SIMILAR-TO allows
a few ways of finding similar
sequences, but the most common
is BLAST (the default choice).
Like BLAST, the function needs a
query sequence. Click the query box,
and type the name of the gene T4p001
(don't worry about upper/lower case).
Then press Enter to close the box.
If you executed the function as it
stands, it would search (by default)
for protein matches.
But if you didn't know this, you
could specify explicitly what
kind of search you want.
To do this, mouse over
the Options icon…
…click Protein-vs-Protein
(equivalent to BlastP),
and click Apply.
It’s possible to limit the search to
different classes of proteins, but
we’ll just accept the default – all
proteins from all organisms and
phages within PhAnToMe.
The function is complete,
so execute it. One way is to doubleclick the name of the function,
SEQUENCE-SIMILAR-OF.
But this time we'll do it the same way
as before, through the action icon.
Click Execute on the action menu.
The function displays the
results in a popup window
for human consumption,
but it also shows the result
in the Result Pane (this
shows what is available
for future computation).
There are evidently a great many
proteins known that are similar to
p-T4p001 (the protein encoded
by the gene T4p001).
Let's use this result.
First X out of the pop-up display.
The list of protein can be used
directly (e.g. to make an alignment),
but it is better practice to give the
list a name so you can recall to
you later what you did.
To give it a name, mouse
over the Definition button…
…and click DEFINE.
The DEFINE function asks
for two things from you:
the values you want to name, and
the name of the variable that will
contain these values. The name
can be anything you'll remember
(upper/lower case doesn't count).
First the name of the variable.
Click var to open up the
variable box
Type a name that makes sense
(I chose rII-like) and press
Enter to close the box.
(The function cannot be executed
if any box is open for entry)
Next the values. They were given
by the function I just executed.
Drag that function by
clicking and holding
the name of the function,
SEQUENCE-SIMILAR-TO.
…and dragging it
towards the value box
When it reaches the value
box, the box will become
highlighted in red. At that
point, release the mouse…
…and the function will now reside
in the value box.
Execute this function
as you have the others,…
…by clicking Execute on the
function's Action menu.
Be careful not to use the action
menu of the inner function
SEQUENCE-SIMILAR-TO.
That will work -- eliciting the
sequence comparison – but no
definition will take place.
Nothing drastic seems to have
happened, but if you look
carefully, you'll note two changes.
First, a list of phages has
appeared in the Result pane.
Second, a new Variables
button has appeared.
We'll use it momentarily.
We wanted to use the Blast
results, now stored in rII-like.
…for what?
Ah yes! The time has come
to align the protein sequences.
To do that, mouse over the
Strings-Sequences menu…
Problem
- Find such proteins
- Align them
- Make a phylogenetic
tree
…and mouse over
Bioinformatic-Tools….
…and click ALIGNMENT-OF.
The ALIGNMENT-OF function
asks for a sequence list.
Fortunately, you now have one.
Click the sequence-list box…
…and mouse over your
new Variables button…
…and click your new
variable rII-like button
to bring it into the box.
The function is now ready for
execution, but there are two ways
you can tweak the function settings
to make the output more useful.
To make these changes,
mouse over the Options icon…
…and click colored to produce
a graphical alignment rather
than pure text…
…and click Label-with-organism
to cause the alignment lines to be
labeled with the names of the
proteins' organisms rather the proteins
themselves.
Finally, click Apply…
…and go to the action icon…
…to execute the completed function.
The graphical output is produced
by a Java Applet called Jalview.
Activate the applet. It might
take several seconds to
complete the alignment
A useful alignment, perhaps.
Now on to the phylogenetic tree.
First, X out of the alignment.
Back to the
Strings-Sequences menu…
Go to the
Phylogenetic Tree submenu…
… and click TREE-OF.
Note that TREE-OF is
asking for an alignment.
Provide one by dragging the completed
ALIGNMENT-OF function into the
alignment box.
Click and hold the
ALIGNMENT-OF box…
…and drag it towards its
target, the alignment box.
You'll know you've gotten there
when it becomes highlighted.
Release the function.
The Colored option is no longer
useful (the output it provides
is just for human consumption,
not for TREE-OF).
Get rid of it by clicking its Delete icon.
You may have noticed that the
alignment you produced before had
many columns that were mostly
gapped. These are given too much
weight by phylogeny programs.
To remove those columns, modify the
behavior of ALIGNMENT-OF by
mousing over its Option icon…
…clicking the
No-gapped-columns option…
…and finally clicking Apply.
Now you're ready to
execute in the usual way.
(This will take longer than
the alignment – perhaps
a few dozen seconds)
You should soon receive
in separate popup windows a
phylogenetic tree based on
the no-gaps alignment
of the rII-like sequences.
As one might expect, the rII
proten from phage T4 clusters
with proteins from other
enterobacteriophage.
Integration of Tools
Reflections and Coming Attractions
This tour presented three of the most bioinformatic
common tools employed by biological researchers:
searching by local alignment (Blast), multiple sequence
alignment, and construction of phylogentic trees. There are,
of course, many, many more tools a researcher may find
valuable, and the collective burden can be overwhelming.
The case was presented that much is gained by putting the
tools within a single interface, BioBIKE. Granted,
BioBIKE has its own idiosyncrasies to learn, but at least
it’s just one set.
The interface that permits access to multiple tools and
databases also permits the creation of new tools conceived
by a research to address an immediate need, and this topic
is explored in the tour, Creating New Tools.