MB_genedb_extended

Download Report

Transcript MB_genedb_extended

Marie-Adèle Rajandream
The Pathogen Sequencing Unit
The Sanger Institute
The Wellcome Trust Genome Campus
Hinxton
Cambridge
United Kingdom
The Sanger Institute
 Principally funded by Wellcome Trust (about 96 %)
 60,000,000 bases per day of raw data
 600 employees
 Sequencing of Human, Mice, Zebrafish & pathogen genomes
 Manual and automatic genome annotation (Ensembl, Artemis)
 Identification of cancer causing mutations (recently BRAF gene mutation)
 Sequence variation and disease association
The Pathogen Sequencing Unit
Sequencing
 Small genomes (bacterial and model organisms)
 60-70 projects
 Current capacity 4 M reads p/a sufficient for 100 Mb of finished sequence
 Mainly whole genome/chromosome shotguns including finishing
 Many are international collaborations
 Larger more complex genomes (35-100 Mb) on the horizon
Informatics
 Automatic analysis
 Manual annotation by expert biologists
 Tools: finishing (Cyclops), annotation (Artemis), comparative analysis (ACT)
 Data dissemination
 Database resources
Functional Genomics
 S. pombe
 Bacterial Genomes
 D. discoideum
GeneDB
http://www.genedb.org
Project pages
GeneDB
http://www.genedb.org
analysis
BLAST
sequences
FTP
site
annotation
curation
What is GeneDB?
• a generic organism database
• annotated sequences as well as functional data
• visualisation in user-friendly environment
• annotation and analysis of data by biologists
• flexible enough to incorporate new data types
• linked to external databases
• fully curated
The GeneDB project
• Started in 2001
• Funded by the Wellcome Trust for a period of 5
years
• Initially for 3 organisms: S. pombe, Leishmania &
Trypanosome
• 2 full-time programmers, 1 part-time programmer
• One curator for each organism
• One helpdesk person / programmer
• Prototype now done and in use
Technical Outline Prototype
Data
Web
asp
jsp
biojava
images
serialise
indices
cgi
blast
ominblast
“Java”
data
gui
asp
cerevisiae
pombe
malaria
common
cerevisiae
images
serialise
indices
minelet
mining
leish
tryp
pombe
test
malaria
utils
tryp
web
leish
EMBL
Broad specifications for
production version
• Relational database
• Curator / annotator interface incorporating
functionality of Artemis (MESS)
• Facility for doing more complex queries
For comprehensive, detailed specs see our
Functional Specifications document
P. falciparum chr. 14
“biotin carboxylase”
Inferred by Sequence Similarity
with a yeast sequence
SGD:S0005299
(which was originally annotated
based on a published
mutant phenotype)
Wellcome Trust Sanger Institute
Pathogen Sequencing Unit
Project Management
Sequencing
Bart Barrell
Julian Parkhill
Marie-Adele Rajandream
Al Ivens
Neil Hall
Carol Churcher
Karen Brooks
Inna Cherevach
Tracey Chillingworth
Kay Clarke
Paul Davies
Nancy Hamlin
Kay Jagels
Sharon Moule
Brian White
Sally Whitehead
Analysis
Martin Aslett
Steven Bentley
Matthew Berriman
Ana Cerdeno
Christiane Hertz-Fowler
Matthew Holden
Keith James
Rachel Lyne
Arnab Pain
Chris Peacock
Mohammed Sebaihia
Nick Thomson
Valerie Wood
Programming
Rob Davies
David Harper
Arnaud Kerhornou
Paul Mooney
Kim Rutherford
Adrian Tivey
Ed Zuiderwijk
Subcloning
Ann Cronin
Audrey Fraser
David Johnson
Mike Quail
Claire Price
Ester Rabbinowitsch
Sarah Sharp
Mapping
Administration
Yvonne Shaw
Maria Fookes
John Woodward
David Harris
Matthew Collins
Nigel Fosker
Arlette Goble
Lee Murphy
Susan O’Neil
Simon Rutter
David Saunders
Kathy Seeger
Robert Squares
Steven Squares
Karen Mungall
Theresa Feltwell
Ian Goodhead
Zahra Hance
Heidi Hauser
Mandy Sanders
Mark Simmonds
Danielle Walker
Barbara Harris
Becky Atkin
Andrew Barron
Carol Chillingworth
Louise Clarke
Craig Corton
Jonathan Doggett
Nicola Lennard
Alexandra Line
Doug Ormand