Presentation

Download Report

Transcript Presentation

BarleyBase: BarleyBase.org
BARLEYBASE – AN EXPRESSION PROFILING DATABASE FOR CEREAL GENOMICS
Xiaoyun Tang, Jian Gong, Jianqiang Xin, Lishuang Shen, Stacy Turner, Rico A. Caldo, Dan Nettleton, Roger P. Wise, Julie A. Dickerson*
Virtual Reality Applications Center, Iowa State University, Ames, Iowa 50011
Data Access
Abstract
• Download complete data sets for experiment annotation, raw and
normalized expression data in MAGE-ML, comma-separated values
(CSV), or cel-file formats.
• Experiment, hybridization and probe set browse & query.
• Query and filter probe sets by expression profiles.
• Search by biological criteria: annotation keywords, sequence, probe set
names, pathway or gene family membership.
• Data set management and creation for filtered probe sets.
• Owner-controlled, group access to private submissions.
BarleyBase is a USDA-funded public repository for plant microarray
data. BarleyBase houses raw and normalized expression data from the
22K Affymetrix Barley1 and Arabidopsis ATH1 GeneChips, presently the
only two available Affymetrix high-density arrays from plants, along with
experiment and sample information.
BarleyBase features a web-based, MIAME-compliant, experiment
submission tool, BarleyExpress. BarleyExpress allows users to efficiently
submit and manage their experiment descriptions, array design and
expression analysis information.
BarleyBase contains a broad set of query and display options at all data
levels, from experiment, hybridization to probe set and probe levels.
Users can query microarray elements by expression profile and by
biological information of the probe sets. Probe set queries are seamlessly
integrated with visualization and analysis tools such as scatter plots, the R
statistical toolbox, and data filters.
BarleyBase collaborates with PlantGDB and Gramene databases to
perform gene prediction and cross-species comparison at the genome
level using the Barley1 GeneChip exemplar sequences.
BarleyBase is accessible at http://www.BarleyBase.org/
BarleyBase Overview
Data Processing
Pipeline
BarleyExpress
BarleyBase
Batch Download
MAGE-ML
Raw Data
CSV
Fig. 2. BarleyBase Homepage
BarleyExpress Submission Steps
•
•
•
•
•
•
•
•
Experiment design information submission.
Submit experiment factors and factor level as treatments.
Batch upload raw GeneChip data.
Associate raw data files with each studied treatment.
Protocol submission – optional.
Sample preparation details for each hybridization.
Finalize experiment submission.
Grant access to designated individuals and groups.
Fig. 5. Probe Set Query and Result Visualization
Visualization & Analysis
• Visualization for experiments, hybridizations, probe sets, and probes.
• Data analysis uses data sets obtained from probe set filtering.
• Analysis methods include hierarchical clustering, k-means
partitioning, PCA, SOM, and multi-dimensional scaling (MDS)
• Identification of differentially expressed and co-expressed genes.
• Most data analysis & visualizations use R and Bioconductor.
• Probe alignments with exemplar sequence.
• Gene prediction through interconnections with PlantGDB database.
• Cross-species comparative genomics through the Gramene database.
Query & MAS5.0
Analysis RMA
Internet
User
Fig. 1. BarleyBase Overview
Data Acquisition & Processing
• Experiment and expression raw data submission by submitter.
• BarleyBase normalizes submitted raw data. Methods are the statistical
algorithm from Affymetrix MAS 5 and RMA (Robust Multi-Array
Analysis) from Bioconductor.
• Compute summary statistics and graphs for raw and normalized
expression data for summary and quality diagnostics.
• Store all types of data in an open-source MySQL database.
• BarleyBase assigns unique accession numbers to experiments,
hybridizations & samples.
• BarleyBase generates MAGE-ML files and CSV files for batch
download.
• Experiment submission and associated data are available for online access
and analysis.
BarleyBase Data Model
• BarleyBase uses a hierarchical data model to store gene expression data
that is based on the Affymetrix GeneChip data formats.
• The highest level data structure is experiment, each of which contains one
or more treatments, each treatment has one or more samples as replicates,
each sample has one or more hybridizations.
• Protocols are associated with experiment at the hybridization level.
• Five types of tables: Array, Expression, Experiment, Protocol, Submitter.
• Follows MIAME principles recommended by MGED and implemented in
MIAMExpress, but removes the Extract level and captures the information
for hybridization protocol.
• Added statistical experimental design factors fields.
• Using plant ontology and controlled vocabulary in experiment description.
• Biological annotation for microarray probe sets and exemplars.
• Presently, only stores expression data from Affymetrix GeneChips.
Fig. 3. Major Steps in Experiment Submission
BarleyExpress Features
•
•
•
•
•
MIAME-compliant, web-based data submission and annotation tool
Experiment, array design, protocol, sample, expression submissions
Enforces plant ontology in collaboration with Gramene.
Uses controlled vocabulary for descriptions wherever possible
First database to explicitly capture information on experiment factors
and levels for presenting experiment in factorial design.
• Images and other supporting information can be uploaded.
• Minimal requirements on user’s computer skills and effort.
• Flexible access control for submitters to designate individuals or groups
access to their private data before publication.
Fig. 6. Graphs for Hybridization Expression & Cluster
Future Plans
1. Cross-experiment analysis.
2. Visualization and analysis tool development.
3. Barley1 exemplar annotation.
Acknowledgments
1. BarleyBase is funded by USDA-NRI/CGP #2002-03582; USDA-CSREES
North American Barley Genome Project; USDA Initiative for Future
Agriculture and Food Systems (IFAFS) #01-52100-11346.
2. PlantGDB, Gramene, KEGG, TAIR for providing tools or genomic data.
3. Many people who provided technical support and advice on BarleyBase
development.
Fig. 4. Probe Alignment with Barley1 GeneChip Exemplar