Download Report

Transcript daphnia-wfleabase

Daphnia Genome Database
from Common Components
Daphnia Genomic Consortium
Meeting, Sept. 2003
Don Gilbert, [email protected]
A Replicable Genome
infOrmation System ( Argos )
http://eugenes.org/argos | flybase.net/flybase-ng
java/ ; perl/ -- program libraries and packages
servers/ -- major programs (BLAST, MySql/PostgreSQL, others)
systems/ -- OS executables of programs
daphnia/ .. implemented organism genome systems
docs/ & install/ -- Argos instructions and usage
template/ -- structure for new projects
ROOT/ -- common directory of installed projects
Argos features
Common genome tool set
Share benefits of “best of breed” genome tools
Common parts are tested & maintained by others
Minimal IT expertise (no compiles or system management)
Choice of tools (existing or new genome DB use parts desired)
Flexible project packages
 Project needs specify tool set (compare EnsEMBL where all use one set)
 Own look’n’feel web pages, contents, functions
 Security for protected and public sections
Easy replication to any Unix computer
‘Live’ database system replication using rsync
Keep remote servers up-to-date every day
Local cluster/grid for high-volume traffic
Works on common workstations, laptops
Argos - advanced features
Data mining
 Fulfill need to search & retrieve 1000s of genes
 Web Services, Grid Services and LDAP for large data sets
 Simple, computable, industry standards for query by criteria and retrieval
of volumes of data
 Bypass time-consuming web pages made for people
 Use with personal, lab databases to keep genome links up-to-date
Argos common parts
Java common library, Ant builds, XML Tools,
Web Services (Axis), Lucene for “Google”-like searches
Perl common library of BioPerl, GBrowse, others
Servers include
Apache, Tomcat web servers
MySQL, PostgreSQL databases
Systems compiled for
apple-powerpc-darwin, intel-linux, sun-sparc-solaris
wFleaBase structure
Cgi-bin -- Web programs(Perl)
Common -- Link to common, shared tools
Conf -- Site configurations for web, data
Data -- Bulk data & FTP site folder
-- Project databases: blast, lucene, mysql
Indices -- Database indices
-- Program libraries
Web -- Web structure and documents
Genomics, Sequences, Maps, Literature, Stocks, Docs, other
includes Public and Protected (project member only) parts
Webapps -- Web programs (Java)
includes Search system, Secure web and editing
Search wFleaBase
BLAST wFleaBase
Edit wFleaBase
Where to put Daphnia Genome?
Database needs
Automated annotation and curated updates
Search and retrieve data subsets
EnsEMBL - working now, Gramene & others
GMOD:Chado - in development
(FlyBase,WormBase, ChlamyGenome,TIGR,
others will use)
Others choices?
Generic Model Organism Database
Construction Set www.gmod.org
Genome+ Database (more than annotations)
Genome visualization tools
Genome annotation pipeline planned
Literature curation and Gene Ontology
Component system (pick and choose)
Developing - more complete in 2004
EnsEMBL Genome Database
Genome annotation database
Genome visualization tools
Genome annotation pipeline
Comprehensive system (all or none)
Production - useable now
From Shawn Hoon, Fugu Informatics Group
wFleaBase issues
• Basic web system ready for genome data?
• Start with EnsEMBL for management; move to
GMOD:Chado if better choice?
• Add GMOD GBrowse; Apollo Editor with genome
• Add “Self-service” database features for?
• Easy management by scientists
• Genome data; stocks; research literature
• Add evolutionary, ecological, environmental data
Prototype at http://iubio.bio.indiana.edu/daphnia/
GBrowse Maps
Apollo Annotator