Transcript ppt

NGS Bioinformatics Workshop
1.5 Tutorial – Genome Annotation
April 5th, 2012
IRMACS 10900
Facilitator: Richard Bruskiewich
Adjunct Professor, MBB
Workflow for Today
Prepare to visualize annotation
Get a genomic sequence from Genbank
Repeat mask it.
Retrieve a genomic sequence…
Retrieve a (relatively small <100kb, eukaryote)
genomic sequence clone from Genbank
Query Nucleotide divisione.g. Arabidopsis BAC
clone (HE601748.1)
Select FASTA
Save.. To File.. As “Fasta” (rename?)
Blast is a low hanging fruit…
Use BLAST to quickly survey for similar
sequences
Megablast against nucleotide
e.g. HE601748 is closest to A. thaliana chr. 5?
Megablast against reference RNA sequence db
Repeat Masking
Upload the clone file to RepeatMasker on the
web and run with appropriate parameters:
http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker
Save the results (including the masked
sequence) to your computer
ab initio Gene Predictions
Genscan:
http://genes.mit.edu/GENSCAN.html
Cut and paste results as text to a file
Fgenesh:
www.softberry.com
Blast2GO
http://www.blast2go.com
 Annotation workbench, via Gene Ontology (GO) terms.
 First, save the predicted peptides (e.g. from fgenesh)
need to fix the FASTA headers to assign proper identifiers
(could write a script?)
 (Java web) start blast2go workbench
 Load in peptides
 Do the analysis… e.g. run blastp, GO, annotation,
Interpro, etc.
 See www.geneontology.org for details on GO
 http://www.ebi.ac.uk/interpro/ for interpro info
EMBOSS
European Molecular Biology Open Software
Suite (EMBOSS):
http://emboss.sourceforge.net
Download and install version of interest (e.g.
Linux, Mac OSX, Windows…)
Decide what do to:
http://emboss.sourceforge.net/apps/groups.html
Let’s try a CpG island plot (cpgplot)
Study Genes by Comparative Genomics
JGI Vista toolkit:
http://genome.lbl.gov/vista
GenomeVista
rVista