MGED 7, 2004 Poster

Download Report

Transcript MGED 7, 2004 Poster

BarleyBase: BarleyBase.org
BARLEYBASE – A MIAME-COMPLIANT EXPRESSION PROFILING DATABASE FOR PLANTS
Lishuang Shen, Jian Gong, Jianqiang Xin, Xiaoyun Tang, Rico A. Caldo, Stacy Turner, Dan Nettleton, Roger P. Wise, Julie A. Dickerson*
Virtual Reality Applications Center, Iowa State University, Ames, Iowa 50011
BarleyExpress- Web-Based Submission
Abstract
BarleyBase (www.BarleyBase.org) is a USDA-funded public
repository for plant microarray data. BarleyBase houses raw and
normalized expression data from the 22K Affymetrix Barley1 and
Arabidopsis ATH1 GeneChips, plus experiment and sample annotation.
And it is expanding to other plant microarray platforms.
BarleyBase features a web-based, MIAME-compliant, experiment
submission tool, BarleyExpress. BarleyExpress allows users to efficiently
submit and manage their experiment descriptions, array design and
expression analysis information.
BarleyBase contains a broad set of query and display options at all data
levels, from experiment, hybridization to probe set and probe levels.
Users can do cross-experiment query on probe sets by expression profile
and by biological information. Probe set queries are seamlessly integrated
with visualization and analysis tools such as scatter plots, the R statistical
toolbox, and data filters.
• BarleyExpress is a MIAME-compliant microarray submission and
annotation tool adapted from MIAMExpress.
• Submitters first input experiment design information.
• Annotate experiment in factorial design with factors and factor level.
• Batch upload raw GeneChip data files.
• Associate raw data files with each studied treatment.
• Protocol submission – optional.
• Input sample preparation details for each hybridization. Use templates to
reuse previous sample submissions.
• Finalize experiment submission.
• Submitters grant access to designated individuals and groups.
• Plant ontology and controlled vocabulary are enforced at each step.
Visualization & Analysis
• Web-based microarray data analysis pipelines integrate a broad set of
probe set query and display options with analysis tools.
• Interactive visualization at all data levels for experiments,
hybridizations, probe sets, and probes.
• Gene list creation with cross-experiment and cross-platform probe
set queries for generating hypotheses about genes of interest.
• Identification of differentially expressed and co-expressed genes with
multiple statistical test and expression profile filters.
• Pattern recognition on gene lists, methods include hierarchical
clustering, k-means partitioning, PCA, SOM, and multi-dimensional
scaling (MDS).
• Gene list classification by Gene Ontology.
• Data analysis & visualizations use R and Bioconductor.
• Probe alignments with exemplar sequence.
• Gene prediction through interconnections with PlantGDB database.
• Cross-species comparative genomics through the Gramene and
GrainGenes databases.
BarleyBase collaborates with PlantGDB, Gramene and GrainGenes to
perform gene prediction and cross-species comparison with Barley1
GeneChip exemplar sequences. NASCArrays shares ATH1 data.
BarleyBase houses 20 experiment submissions from Barley and
Arabidopsis with total 741 hybridizations (August 31, 2004).
Data Processing
Pipeline
BarleyExpress
BarleyBase
Batch Download
MAGE-ML
Raw Data
CSV
Query & MAS5.0
Analysis RMA
Internet
User
Figure 1. BarleyBase Overview
Data Acquisition & Processing
• Experiment and expression raw data submission by submitter.
• BarleyBase normalizes submitted raw data with the statistical algorithm from
Affymetrix MAS 5 and Robust Multi-Array Analysis (RMA) .
• Compute summary statistics and graphs for raw and normalized expression data
• Store all types of data in an open-source MySQL database.
• BarleyBase assigns unique accession numbers to experiments, hybridizations &
samples.
• BarleyBase generates MAGE-ML and CSV files for batch download and data
exchange.
• Submission and associated data are available for online access and analysis.
Figure 4. Expression & Annotation for Exemplar Barley1_11969
Figure 2. Major Steps in Experiment Submission
Data Access
• Batch download complete data sets for experiment annotation, raw and
normalized expression data in MAGE-ML, comma-separated values
(CSV), or CEL-file formats.
• Navigate experiment, hybridization, sample data, exemplars.
• Gene list creation & management for gene-centric analysis.
• Access probe sets based on expression profiles with single- or crossexperiment query.
• Search genes by biological criteria: annotation, sequence, gene
ontology category, pathway, gene family membership.
• Flexible, submitter-controlled data access, group access to private
submissions
Figure 5. Visualization for Hybridizations & Gene Cluster
Future Plans
•
•
•
•
•
•
BarleyBase Data Model
• BarleyBase uses a hierarchical data model to store microarray gene
expression data.
• The top level data structure is experiment, each contains one or more
treatments, a treatment has one or more samples as replicates, a sample has
one or more hybridizations.
• Protocols are associated with experiment at the hybridization level.
• Five table types : Array, Expression, Experiment, Protocol, Submitter.
• Follows MIAME principles recommended by MGED and implemented in
MIAMExpress, tuned for plants, and removes the Extract level.
• Added statistical experimental factorial design factors fields.
• Enforcing plant ontology and controlled vocabulary in experiment
description.
• Biological annotation for probe sets and exemplars with Gene Ontology..
• Support expression data from Affymetrix GeneChips, will add spotted
microarray support.
Evolve into PlExDB, a comprehensive Plant Expression Data Base
Support other major plant species: maize, rice, soybean, wheat.
Support spotted cDNA and long-oligo microarray platforms.
Analysis & visualization tool development.
Cross-experiment, cross-platform & cross-species data analysis.
Exemplar annotation with Gene Ontology and pathway information.
Acknowledgments
•
•
•
•
•
Figure 3. Gene List Creation, Management & Analysis
The BarleyBase project is funded by the USDA National Research
Initiative (NRI) grant no. 02-35300-12619 and USDA-CSREES North
American Barley Genome Project.
PlantGDB, Gramene, GrainGenes, KEGG, TAIR share tools and genomic
data.
NASCArrays and TAIR share Arabidopsis ATH1 GeneChip data.
BarleyBase is hosted at the Iowa State University Virtual Reality
Applications Center.
Exemplar sequences and BLASTX NR annotations were provided by
HarvEST:Barley.