Transcript Slide 1

What distribution of peptides result
from digesting proteins with trypsin?
This tour guides you through a
computational experiment that you can
perform within BioBIKE.
To get to BioBIKE, go to:
http://ixion.csbc.vcu.edu:8003/biologin
Enter a login name (letters only, no spaces)
No password necessary
This demonstration is best viewed as a slide show,
enabling you to simulate a session and make
changes
in cursor
more
Click
anywhere
to position
go on to
theobvious.
next slide
To do this, click Slide Show on the top tool bar, then View show.
What do you get when you cut all the
proteins of an organism with trypsin,
a proteolytic enzyme that cuts proteins
after lysine (K) and arginine (R)?
You get a mess of peptides, of course,
but how small? How many of each
size class?
We can answer the question through a
computational experiment. Teaching
BioBIKE to be trypsin and applying
the enzyme to an organism-worth of
proteins.
Let's learn how to do this with one
protein sequence and apply the
lesson to all lessons of a protein.
First get a sequence of a protein.
Perhaps a protein from our favorite
cyanobacterium, ss120.
I'll choose p-pro0047. Type it in the
box and press Enter.
Click “Execute” in the Action
Menu to evaluate the expression
and see the sequence in the lower
frame.
Ok. There it is, in all its glory. Or
rather 200aa of its glory. But that is
neither here nor there. We want to cut
the sequence (or split it) after "K" and
"R". Select “Split” to begin thinking
like an enzyme.
It looks like BioBIKE would
appreciate a string here. Fair enough,
let’s give it one…
… a sequence is a string. My strategy
is to cut and paste the SEQUENCEOF command into the argument of
SPLIT.
Click in the input box of SPLIT.
Over in our sequence-calling node,
select “Cut” to cut this function…
… and paste it into the SPLIT
function.
Done and done. Let’s give SPLIT a
little guidance by telling it where to
cut the string. Trypsin cuts after "K"
and "R".
Trypsin cuts after every "K" and "R".
Let's try "K" first. Enter "K" and
execute your SPLIT command to see
what it does.
The result of splitting the sequence
after every K is shown below. Take a
look at it. Is it right? How can you be
sure?
But trypsin cuts at both "K" and "R".
You can tell SPLIT to split at either
letter by supplying a list of letters.
Erase "K" by clicking the red x.
Bring down the LIST function into the
input box of EVERY.
One list item comes automatically, but
we need two, one for each letter.
Type both letters into the input boxes.
“K” and “R” now specified. Robot
trypsin will cut after every Lysine and
Arginine. Hit execute to see what this
looks like.
Look carefully at the results. Are these
fragments what you expect? Should
there be others? Are there any
mistakes?
Seems pretty variable… Our goal is to
get a distribution of lengths of
fragments produced by trypsin.
Strategy: wrap LENGTHS-OF around
the current function.
Choose “Surround with” and …
… click on LENGTHS-OF to evaluate
the length of each fragment. Don’t
forget we want the plural statement,
and not the one directly above it.
Hit “Execute.”
And there they are: the length
of each fragment of the protein
p-Pro0047 as digested by Trypsin.
Do the numbers make sense?
Check by comparing these numbers
with the peptides of the previous
result. Do the lengths agree?
Now that we've taught BioBIKE how
to digest a single protein, let’s do all
the proteins in an organism.
Erase p-pro0047, click the input box,
and…
… and choose PROTEINS-OF from
the GENOME menu.
Enter ss120 into the Input Box of
PROTEINS-OF, and press Enter.
Then execute LENGTHS-OF again
(making sure to choose Execute from
the action menu of LENGTHS-OF) to
display the lengths of the fragments
produced by digesting all proteins in
ss120 by trypsin.
Are these numbers correct? Check
by… we seem to have skipped a step!
We haven't produced any segments to
compare the numbers to.
OK, do it now, by executing the
SPLIT function, using Execute off of
SPLIT's action menu.
Now you can check, comparing the
lengths of these sequences with the
numbers in the previous
This time the numbers and lengths don't
agree at all!
(31 31 75…) isn't (2 17 5…)
Why? One clue is the double parentheses
preceding the list. If you scroll through
your results, you'll find that the result is in
the form ((…)(…)…).
BioBIKE is actually returning the number
of elements in each sublist.
If those extra parentheses just went
away, the function should work. We
can do this with the SIMPLIFY-LIST
function, which goes around your
SPLIT node and combines all the
sublists into a single list.
Surround SPLIT…
…and wrap SIMPLIFY-LIST
around it.
No more
Execute
((double
SIMPLIFY-LIST,
parentheses))!
just
But to
does
make
thissure
do any
it works
good?before
executing LENGTHS-OF.
Execute the entire function
Again, there they are.
Now do the numbers make sense?
Now that it seems to work, we want to
package the procedure into a function
so that we never have to think about it
again.
First we need to get more screen
space. Select “collapse” from the
SIMPLIFY-LIST node.
Bring down into the workspace
DEFINE-FUNCTION. It’s up in the
DEFINITION menu.
Give it a catchy name.
Then describe the function so you
(and perhaps others) can get a basic
idea of what it does…
…from a helpful summary
Give a descriptive name to the
argument – the information the
function acts on.
TRYPSIN-DIGEST-OF will act on
proteins.
The body of the function is simply the
procedure we perfected using proteins
of ss120 as the test case.
Copy that function…
… and paste it into the body, using the
menu obtained by clicking the box's
green action icon.
Now that it's there, expand it
to see what we got.
Whoops! We left it working on a
specific case – all proteins of ss120.
We need to make it work on a general
case, whatever the user provides as the
argument, proteins. Delete the
PROTEINS-OF node.
…and click on the input box of
SEQUENCE-OF...
…to type “proteins,” which is the variable
we asked TRYPSIN-DIGEST-OF to look
for in the first place.
Executing DEFINE-FUNCTION adds our
function to BioBIKE.
Check it out. No, seriously. Check it out.
If the function is correct, it should replace
what we had before and give exactly the
same result.
Start off by deleting what we had before.
Click on the input box of LENGTHS-OF,
and put inside of it our new function.
Notice that you can get the function from
the new FUNCTIONS button.
The function you made is now part of the
language
Click on the input box of TRYPSINDIGEST-OF, and put inside of it
PROTEIN-OF (from the Genes-Protein
menu).
Fill its input box with ss120.
This is exactly what we did before (I
hope), except that the complicated
SIMPLIFY-LIST function is now
encapsulated in TRYPSIN-DIGEST-OF.
Execute it (the moment of truth).
Are these results the same as before?
Now that you've confirmed the results,
we can trust the new function (more or
less). So let's get rid of it.
Much less cluttered!
Now the problem is to analyze all those
numbers.
Strategy: Count how many there are in
each size class, then hand the results over
to Excel to make a graph.
The BIN-DATA-OF function will do the
counting, according to classes that I
define.
The function should act on the results I
just obtained. Click on the input box of
BIN-DATA-OF.
I could give those results a name, using
DEFINE, but in this case I'll refer the
result using the PREVIOUS-RESULT
function.
Sometimes it's desirable to make bins that
combine different classes, but this time,
I'll put each size class in a separate bin.
So the bin-width is set to 1, and I make
a guess that I'm not going to find a
fragment bigger than 500 amino acids.
After typing in these values, I execute the
function.
The results tell me that there are 0
fragments of length 0 up to 1, 7958
fragments of length 1 up to 2, and so
forth.
To save the result to a file for import into
Excel, I use the WRITE function.
The material to write is the previous
result, which…
… I can copy…
… and paste into the appropriate input
box.
The file-name is…
… whatever I wish, so long as the name
is in quotes. It is also desirable to give it a
txt extension, so that it can be
automatically opened by Notepad or
similar.
Finally, Excel likes tab-delimited files, so
I choose that format...
… and execute the resulting function.
Executing the file, writes the binned data
to my personal file space on the BioBIKE
server. To see this file and to download it
to my own computer, I go to the BioBIKE
Files menu.
This gets me to the directory of my
personal file space.
Clicking on the file gets me a view of the
file. I can use the usual browser controls
to download the file to my own computer.
Now you can make an x-y scatter plot
to visualize the distribution of trypsin
fragments resulting from the digestion
of all proteins of Prochlorococcus ss120.
What distribution of peptides result
from digesting proteins with trypsin?
In this tour, you've seen:
- How to simulate a digestion of a protein by trypsin.
- How to digest all proteins of an organism at once
- How to package a useful procedure into a new function
- How to format numeric results for a histogram plot
and a very important general lesson:
- The importance of checking every result to avoid being
fooled by the computer.