Transcript Homework

CS177 homework assigned March 2
•this can be either a group or individual assignment,
whichever is easier for you
•this will cover
–multiple alignment
–html and java
•this is a lot of stuff, so it is due in 2 weeks
•the balance should start shifting from assigned
homework to doing your projects
“vertical” multiple alignment
• pick a gene, find the human mRNA (ie,
NM_XXXX) RefSeq, and query NCBI
nucleotides using BLAST
– see if you get hits to 5 or 6 different species
– if not, try another gene
– pick out one good hit (ie, low p value and pretty long)
for each species (including the original human RefSeq)
– submit these sequences to clustalw
– visually identify one column on the alignment that
exemplifies highly conserved, one for moderately
conserved, and one for poorly conserved
“horizontal” multiple alignment
• go to pfam site
– http://www.sanger.ac.uk/Software/Pfam/
– look it over until you are completely confused
– here is my example
•
•
•
•
•
•
run through it
then do your own example
enter “fibrin” in “keywords” box
click on “kringle”
click on “view species tree”
click in box next to homo sapiens and then click “view selected
species alignment”
• meditate on what you are seeing
– amino acids, not nucleotides
– uses the one letter symbol format for amino acids
– can you make any observations about the multiple alignments?
• try it in a second browser window for gorilla and compare with
human
• ditto for mouse
java
• take a look at the NCBI_STRUCTURES.java program
• go to the web site from the last homework
– http://java.sun.com/j2se/1.3/docs/api/index.html
• see if you can find something in the web site that helps you make sense
out of one or two things in the NCBI_STRUCTURES.java program
– hint: Look at the URLConnection class
– just spend 30 or 40 minutes on this. don’t get too frustrated now you will have plenty of time for that once you get a real job
– think of this as a growth experience that builds character
java and html - we will do this in class next week, but
you will have to do it on your own also
•
•
•
•
the NCBI_STRUCTURES.java program can be used as a prototype for this part
go to NCBI web site
view the html source code underlying the web page
locate the form action POST stuff
1.
look at the stuff that happens between the <form and the </form tags
2.
copy the html source into a file, and change POST to GET, save as NCBI.html
•
open another browser window, and read in NCBI.html using the File menu
•
perform a query type of your choice
•
this will not actually work since POST is expected, but notice the stuff in the URL window that is
exposed by using GET
3.
copy the html source into a file, and change the NCBI URL into the URL for my cgi program testloop.cgi,
save as NCBIecho.html
–
repeat the last 3 steps, and see if the echo is the same as the GET
4.
modify NCBI_STRUCTURES.java
•
make it put out what you need for your query
•
modify the part that does the parsing (ie, the line with <dd>) to make it relevant for parsing your
output
–
•
•
•
if you cannot figure out how to modify the parsing, then at least comment it out entirely or you will
not see any output!!
remember that the java program is run as:
java NCBI_STRUCTURES inputfilename
–
5.
hint: figure out the modification by looking at the real output html source from a real query at the real NCBI site
inputfilename is the name of the input file that has 4or 5 gene names to test oout
remember that first you need to run javac
NCBI_STRUCTURES.java to get NCBI_STRUCTURES.classs