Transcript Slide 1
Biopackages.net
Operating System Packages for
Bioinformatics
Allen Day
2005.05.17
What is a package?
Software, config files, documentation,
and/or data encapsulated in a single
file
Metadata describing:
Version, license, package “category”
Dependencies
What the package provides
GMOD target audience
Small MODs
Package Dependency Graph
chado-Hsa
postgresql-AffxSeq
genome-Hsa-annotation-gene
genome-Hsa-annotation-affymetrix
chado
perl-bioperl
perl-go-perl
postgresql-server
genome-Hsa-nib
Dependencies obo-core
What the package provides
ucsc-blat
Dependencies
Build Dependency
Installation Dependency
What is a Package Manager?
Tools to manage installation, upgrade,
uninstallation of packages
Verify package integrity (checksums)
Maintain system integrity
Transactional
Allow rollbacks
Dependency checking
Dependency graph recursion
Allow software customization (patches)
Why bioinformatics packages?
Consistency of installation process
Bioinfo. package installs vary wildly, and
commonly lack documentation
Automatic dependency installation
Perl modules especially bad – bioperl has 60+
modules in its dependency tree
Integrity/Auditing of system state
Know an installed package works, which version,
how to replicate system setup
Tighter integration with operating system
Daemons, config & log file locations, etc.
What’s available?
RPM packages only right now
Primary focus on Fedora Core 2
Some RPMs also available for
Fedora Core 3
RedHat 9
Cygwin
What’s available?
Three primary foci
Applications
Libraries
Data sets
Applications
Gbrowse
Textpresso
BLAT daemon
NCBI Toolkit (BLAST, etc)
HMMer
What’s available?
Libraries
Bioperl
R & Bioconductor
Squid
EMBOSS
What’s available?
Data sets
Genome & protein sequence
Sequence features
Ontologies
All installed using a common directory
structure
What’s available?
UCSC tools (utilities, BLAT system
service, CGI scripts)
Bioperl
R / Bioconductor
GMOD apps (Gbrowse, Textpresso, …)
Data packages
Genome sequence (fa, nib, blastdb)
Genome features (Affy probeset
alignments, mRNA, etc)
GMOD Components Available
das2-Hsa
gmod-web-Hsa
apollo-Hsa
cmap-Hsa
chado-Hsa
chado
gbrowse
textpresso
genome-Hsa-nib
turnkey
ucsc-BLAT
‘Hsa’ can be substituted for your organism
Currently built for ‘Cel’, ‘Hsa’, ‘Sce’
More details…
chado-Hsa
genome-Hsa-annotation-gene
genome-Hsa-annotation-affymetrix
postgresql-AffxSeq
chado
perl-go-perl
perl-bioperl
postgresql-server
…
…
…
genome-Hsa-nib
ucsc-blat
…
…
Gene Expression Components
DAS/2 for
Genotyping,
GeneChip
Quant/Norm
Pipeline
chado-GEC
chado-Hsa
R
Bioconductor
Resources
http://www.biopackages.net
~1000 RPMs for Fedora Core 2, 3
Available via yum
See site for a configuration example.
TODO
Support more architectures
Build for Cygwin & OS X. RPM has been
ported to both
Automate package build process
Build farm of multiple architectures,
controllable via scheduler (GridEngine)
Automate (if possible) inclusion of
new software / data releases
TODO
Build community interest and
involvement
Keep adding more packages!
Keep existing packages current!
Acknowledgements
Patrick Alger
Jared Fox
Brian O’Connor
Todd Harris
Lincoln Stein
Stanley Nelson