Transcript BioPerl
BioPerl
cpan
Open a terminal and type
/bin/su start "cpan", accept all defaults
install Bio::Graphics
use Bio::Seq;
use Bio::SeqIO;
# create a sequence object of some DNA
my $seq = Bio::Seq->new(
-id => 'testseq',
-seq => 'CATGTAGATAG');
# print out some details about it
print "seq is ", $seq->length, " bases long\n";
print "revcom seq is ", $seq->revcom->seq, "\n";
# write it to a file in Fasta format
my $out = Bio::SeqIO->new(
-file => '>testseq.fsa',
-format => 'Fasta');
$out->write_seq($seq);
http://www.bioperl.org
“Bioperl is a collection of Perl modules that facilitate the development
of Perl scripts for bioinformatics applications.”
•Core package provides the main parsers, this is the basic package and
it's required by all the other packages
•Run package provides wrappers for executing some 60 common
bioinformatics applications
•BioPerl db package is a subproject to store sequence and annotation
data in a BioSQL relational database
•Network package parses and analyzes protein-protein interaction
data
Open Bioinformatics Foundation
“.. a non profit, volunteer run organization focused on supporting open source
programming in bioinformatics.”
•BioDAS - XML Infrastructure for exchanging genome annotations
•BioJava - Java toolkit
•BioMOBY - Data and application execution through web services
•BioPerl - Perl toolkit
•BioPipe - Pipelines and workflow project for creating bioinformatics protocol
•BioPython - Python toolkit
•BioRuby - Ruby toolkit
•BioSQL - RDBMS Database schema for storing sequences, annotations, taxa data.
•OBDA - a standard for sequence data access locally, remotely, and via RDBMS
•EMBOSS - Sequence analysis toolkit.
Open Bioinformatics Foundation
“.. a non profit, volunteer run organization focused on supporting open source
programming in bioinformatics.”
•BioDAS - XML Infrastructure for exchanging genome annotations
•BioJava - Java toolkit
•BioMOBY - Data and application execution through web services
•BioPerl - Perl toolkit
•BioPipe - Pipelines and workflow project for creating bioinformatics protocol
•BioPython - Python toolkit
•BioRuby - Ruby toolkit
•BioSQL - RDBMS Database schema for storing sequences, annotations, taxa data.
•OBDA - a standard for sequence data access locally, remotely, and via RDBMS
•EMBOSS - Sequence analysis toolkit.
BioPerl Sequence objects
• Bio::Seq - Sequence object, with features
– Default sequence object
• Bio::PrimarySeq - Bioperl lightweight Sequence Object
– CPU and memory efficient
• Bio::Seq::RichSeq - Module implementing a sequence created from
a rich sequence database entry
– Sequences obtained from a.o. the EMBL database
• Bio::Seq::LargeSeq - SeqI compliant object that stores sequence as
files in /tmp
– Sequences > 100MBases
Sequence and annotation schematic
Incomplete list of topics covered by BioPerl:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Accessing sequence data from local and remote databases
Manipulating sequences
Translating
Obtaining basic sequence statistics (SeqStats,SeqWord)
Identifying restriction enzyme sites (Bio::Restriction)
Identifying amino acid cleavage sites (Sigcleave)
Running BLAST
Parsing BLAST and FASTA
Searching for genes and other structures on genomic DNA (Genscan, Sim4, Grail,
Genemark, ESTScan, MZEF, EPCR)
Aligning 2 sequences
Aligning multiple sequences (Clustalw.pm, TCoffee.pm)
Manipulating clusters of sequences (Cluster, ClusterIO)
Representing sequence annotations
Using 3D structure objects and reading PDB files (StructureI, Structure::IO)
Tree objects and phylogenetic trees (Tree::Tree, TreeIO, PAML)
Bibliographic objects for querying bibliographic databases (Biblio)
Graphics objects for representing sequence objects as images (Graphics)
Sequence manipulation using the Bioperl EMBOSS and PISE interfaces
Exercises
At: http://bioperl.org/wiki/HOWTO:Graphics
Try to run the: “A Better Version of the Feature Renderer” script.
Modify the script to accept an accession number instead of a
filename and retrieve the corresponding sequence from the
EMBL database. Test with accession number: J02933
Hint: “Bio::DB::EMBL”, where is the database located?
Create a BioPerl sequence object from the example1.fasta and
add the ORF starting at position 11 as a feature. Display the
resulting sequence object using the feature renderer script.