Transcript ppt
NGS Bioinformatics Workshop
1.5 Tutorial – Genome Annotation
April 5th, 2012
IRMACS 10900
Facilitator: Richard Bruskiewich
Adjunct Professor, MBB
Workflow for Today
Prepare to visualize annotation
Get a genomic sequence from Genbank
Repeat mask it.
Retrieve a genomic sequence…
Retrieve a (relatively small <100kb, eukaryote)
genomic sequence clone from Genbank
Query Nucleotide divisione.g. Arabidopsis BAC
clone (HE601748.1)
Select FASTA
Save.. To File.. As “Fasta” (rename?)
Blast is a low hanging fruit…
Use BLAST to quickly survey for similar
sequences
Megablast against nucleotide
e.g. HE601748 is closest to A. thaliana chr. 5?
Megablast against reference RNA sequence db
Repeat Masking
Upload the clone file to RepeatMasker on the
web and run with appropriate parameters:
http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker
Save the results (including the masked
sequence) to your computer
ab initio Gene Predictions
Genscan:
http://genes.mit.edu/GENSCAN.html
Cut and paste results as text to a file
Fgenesh:
www.softberry.com
Blast2GO
http://www.blast2go.com
Annotation workbench, via Gene Ontology (GO) terms.
First, save the predicted peptides (e.g. from fgenesh)
need to fix the FASTA headers to assign proper identifiers
(could write a script?)
(Java web) start blast2go workbench
Load in peptides
Do the analysis… e.g. run blastp, GO, annotation,
Interpro, etc.
See www.geneontology.org for details on GO
http://www.ebi.ac.uk/interpro/ for interpro info
EMBOSS
European Molecular Biology Open Software
Suite (EMBOSS):
http://emboss.sourceforge.net
Download and install version of interest (e.g.
Linux, Mac OSX, Windows…)
Decide what do to:
http://emboss.sourceforge.net/apps/groups.html
Let’s try a CpG island plot (cpgplot)
Study Genes by Comparative Genomics
JGI Vista toolkit:
http://genome.lbl.gov/vista
GenomeVista
rVista