Transcript Web-based

Gene expression data in
VectorBase
Fotis Kafatos, George Christophides,
Bob MacCallum & Seth Redmond
Imperial College London
(thanks also to EBI, Sanger and ND)
VectorBase
Outline
1. Project goals
2. What’s currently available
3. Current challenges and future plans
VectorBase
Project goals
• For vector biologists:
– Easy access to gene expression data
• consistent data processing
• For array specialists:
– ArrayExpress submission
– Advanced analysis tools
– Array annotation
VectorBase
EXPRESSION
DATA
BULK
LOADER
STORAGE
& ANALYSIS
VectorBase
• BASE: BioArray
Software Environment
• http://base.thep.lu.se/
• Open source, active
development and user
community
• LIMS, data storage,
export and analysis
• Web-based, user/group
access control
• BASE 2.x adoption will
bring Affy support
Data submission
•
•
•
•
Community submission guidelines available
First batch of experiments loaded by us
Bulk data loader
Sample/experiment annotation requires
intervention from curators
ArrayExpress
EXPRESSION
DATA
BULK
LOADER
‘PUBLIC’
STORAGE
VectorBase
STORAGE
& ANALYSIS
• Data held in BASE is
largely MIAME
compliant
• Script for semiautomated export in
TAB2MAGE format
• One experiment
submitted so far
ArrayExpress
EXPRESSION
DATA
BULK
LOADER
‘PUBLIC’
STORAGE
VectorBase
STORAGE
& ANALYSIS
ArrayExpress
EXPRESSION
DATA
BULK
LOADER
‘PUBLIC’
STORAGE
VectorBase
STORAGE
& ANALYSIS
DATA
SUMMARIES
• BASE web interface
offers powerful and
extendable analysis
environment
• Can be used for multisite collaborations on
pre-publication data
• Steep learning curve/not
100% intuitive
• Not easily linked to
• We provide simpler
views so the casual
user can quickly draw
biological inferences
VectorBase
Standardised data
All displayed data is processed in the same
way:
1. Poor quality spots removed
•
Currently using submitted spot flags
2. Normalisation
•
VectorBase
“lowess” for two-colour experiments
VectorBase
ArrayExpress
EXPRESSION
DATA
BULK
LOADER
• 3 probe types
PROBE
MAPPING
• 6 array designs
• Mapping handled via
Ensembl pipeline:
– Oligo  exonerate
– PCR  e-PCR
– cDNA  exonerate2genes
‘PUBLIC’
STORAGE
VectorBase
STORAGE
& ANALYSIS
DATA
SUMMARIES
VectorBase
ArrayExpress
EXPRESSION
DATA
BULK
LOADER
GENOMIC
DATA
PROBE
MAPPING
AUTOMATIC
ANNOTATION
GFF3
‘PUBLIC’
STORAGE
VectorBase
STORAGE
& ANALYSIS
DATA
SUMMARIES
GENOME
BROWSER
contigview
VectorBase
featureview
VectorBase
VectorBase
VectorBase
ArrayExpress
EXPRESSION
DATA
BULK
LOADER
‘PUBLIC’
STORAGE
STORAGE
& ANALYSIS
GENOMIC
DATA
PROBE
MAPPING
AUTOMATIC
ANNOTATION
DATA
SUMMARIES
GENOME
BROWSER
DATA
MINING
ARRAY BIOLOGISTS
GENOME BIOLOGISTS
VECTOR BIOLOGISTS
BioMart
• Beta version currently available
– http://base.vectorbase.org:9999/biomart/martview
• Improvements still needed:
– experiment annotations
– Alignments (i.e. handle split alignments)
• Federation with current marts
• Integration with new data?
VectorBase
Current challenges and future
plans
• How do you want to query?
• CVs & ontologies
• APIs
• Community submission
• Manual annotation
VectorBase
Querying strategy
• What do you want to query on?
– Fetch all genes upregulated under condition X
– Fetch all experiments with gene X and condition Y
– Fetch all probes with expression similar to probe X
• All essentially boil down to:
– Define probe (genes etc)
– Define significant expression
• ANOVA?
• Up/down-regulation WRT what?
– Define experimental conditions
• Sample annotation
• Experimental design
VectorBase
ArrayExpress
EXPRESSION
DATA
BULK
LOADER
‘PUBLIC’
STORAGE
STORAGE
& ANALYSIS
GENOMIC
DATA
PROBE
MAPPING
CV /
ONTOLOGY
DATA
SUMMARIES
GENOME
BROWSER
DATA
MINING
ARRAY BIOLOGISTS
AUTOMATIC
ANNOTATION
GENOME BIOLOGISTS
VECTOR BIOLOGISTS
ArrayExpress
EXPRESSION
DATA
BULK
LOADER
AE API ?
‘PUBLIC’
STORAGE
STORAGE
& ANALYSIS
GENOMIC
DATA
PROBE
MAPPING
CV /
ONTOLOGY
Array API ?
AUTOMATIC
ANNOTATION
e! API
DATA
SUMMARIES
GENOME
BROWSER
MartJ / MQL
DATA
MINING
Array API
Perl / Java objects for retrieval / handling of array data
– Dual purpose:
• Consistency & efficiency of VB expression website
• Computational access to VB data for all
– Objects must be:
• General, DB-independent
• Compatible with pre-existing Bio API (BioPerl / BioJava)
– Nb. May be pre-existing solution:
• ArrayExpress API?
• BioPerl-Expression?
• MAGE-OM-stk
•
http://neuron.cse.nd.edu/vectorbase/index.php/Array_API_proposal
VectorBase
VectorBase
Community data submission
• Carrot?
– Help with ArrayExpress submission
– Analysis tools
– Dissemination
• Stick?
– Outreach (courses, conferences)
– Networking
VectorBase
GE data  manual annotators
• EST clone-based arrays
– http://tinyurl.com/vlkwo
• Gene-build designed arrays
– Negative evidence less compelling
VectorBase
Longer term plans
 Host-parasite GE data integration &
analysis
 GE-clusters  “upstream” regions 
regulatory elements, upstream TFs
 RNAi phenotypes
 Images
VectorBase
VectorBase
VectorBase
CVs & ontologies
• Integrate MGED and specialist
ontologies for
– Body parts
– Developmental stages
– Disease processes
–…
• Allows comparison across experiments
with similar experimental conditions
VectorBase
BioMart
Most biomarts:
VB Biomart:
• Gene-based
• Probe based
– Many probes not aligned
• Mostly ‘binary’ data
– e.g. a gene either has a
signal domain or doesn’t
• Easily linked with other
(gene-based) biomarts
• Exp data less clear
– e.g. define ‘differential
expression’
• Exports gene/trans IDs
for linking to other Marts
Clustering
• A priority?
• Easy to do on reporter level within
experiments
• Harder to do at gene level across all
experiments
– Binary gene profile: “yes/no differentially
expressed in experiment” ?
• Amazon-style links to “genes which may have
similar expression profiles”?
VectorBase
BASE 2.x
•
•
•
•
Adoption delayed, now in progress
Brings Affymetrix support
Cleaner/modern interface
Better API (Java)
VectorBase