HCLSIG$$Meetings$$2009-11-02_F2F$BioRDF_F2FÂ_4

Download Report

Transcript HCLSIG$$Meetings$$2009-11-02_F2F$BioRDF_F2FÂ_4

BioRDF Breakout
Introduction – Kei Cheung
 Mage-tab – Michael Miller
 vOID – Jun Zhao (remote)
 aTag – Matthias Samwald (remote)
 Discussion – All

1
BioRDF Breakout: Microarray
Use Case
Kei Cheung, Ph.D.
Associate Professor
Yale Center for Medical Informatics
HCLS IG Face-to-Face Meeting, Santa Clara, California, November 2-3, 2009
2
Introduction




Whole-genome expression profiling has created a
revolution in the way we study disease and basic
biology.
DNA microarrays allow scientists to quantify thousands
of genomic features in a single experiment
Since 1997, the number of published results based on
an analysis of gene expression microarray data has
grown from 30 to over 5,000 publications per year
Major public microarray data repositories have been
created in different countries (e.g., NCBI GEO, EBI
ArrayExpress, and CIBEX)
3
Microarray Workflow
4
An Example of
differentially
expressed genes
5
Importance of Integrating Microarray
Data




Due to the high cost and low reproducibility of many
microarray experiments, it is not surprising to find a
limited number of patient samples in each study,
Very few common identified marker genes among
different studies involving patients with the same
disease.
It is of great interest and challenge to merge data sets
from multiple studies to increase the sample size, which
may in turn increase the power of statistical inferences.
The integration of external information resources is
essential in interpreting intrinsic patterns and
relationships in large-scale gene expression data
6
Microarray Data Standards
MGED
 MIAME
 MAGE-ML
 MAGE-TAB

7
Some Examples



Joint analysis of two microarray geneexpression data sets to select lung
adenocarcinoma marker genes (Jiang et al.
2004 BMC Bioinformatics)
Large-scale integration of cancer microarray
data identifies a robust common cancer
signature (Xu et al. 2007 BMC Bioinformatics)
What about neurosciences?
8
Access to and Use of Microarray
data in Neuroscience
NIH Neuroscience Microarray Consortium
 Public repositories such as GEO and
ArrayExpress (including data generated
from neuroscience microarray
experiments)
 Brain atlases (e.g., Allen Brain Atlas and
GenSAT)

9
Ontology-Based Integration
Microarray experiment 1
Microarray experiment 2
Neuron ontology
Brain region (e.g., entorhinal cortex, hippocampus, primary visual cortex)
Part-of
Input to
Layer (e.g., Layer 2 of the enthorhinal cortex)
Part-of
Neuron (e.g., stellate island neuron, pyramidal neuron)
10
Example Federated Queries



Retrieve a list of differentially expressed genes between
different brain regions (e.g., hippocampus and entorhinal
cortex) for normally aged human subjects.
Retrieve a list of differentially expressed genes for the
same brain region of normal human subjects and AD
patients.
Using these lists of genes one can issue (federated)
queries to retrieve additional information about the genes
for various types of analyses (e.g., GO term enrichment).
11
Microarray Experiment Descriptions
E-GEOD-3296 Transcription profiling of primary mouse embryonic fibroblasts (MEFs) from
C57B1/6x129/Sv F2 e14.5 embryos that contain a deletion in the CH1 domain of three of four alleles of
CBP and p300
The CH1 protein interaction domain of the transcriptional coactivators p300 and CBP is thought to
interact with HIF-1alpha and this interaction is thought to be critical to the expression of HIF-1alpha
target genes in response to hypoxia. Trichostatin A (TSA), an inhibitor of histone deacetylases, has been
reported to repress the expression of HIF-1alpha target genes. To test the requirement of the CH1
domain and TSA for gene expression in response to dipyridyl (a hypoxia mimetic), primary mouse
embryonic fibroblasts (MEFs) were generated from C57Bl/6x129/Sv F2 e14.5 embryos that contain a
deletion in the CH1 domain of three of four alleles of CBP and p300. The remaining allele of p300 or
CBP was a conditional knock out allele. Control MEFs with only a single conditional knockout allele of
p300 or CBP were also generated. At passage 3 MEFs were infected with Cre Adenovirus and grown
until they had expanded at least 100 fold. Subconfluent MEFs were treated with ethanol vehicle or
100ng/ml TSA with 5% carbon dioxide at 37 C in a humid chamber for 30 min., followed by ethanol
vehicle or 100 umdipyridyl (DP) for an additional 3hrs. Immediately after treatment, cells were lysed in
Trizol for RNA extraction.
E-GEOD-3327 Transcription profiling of different regions of mouse brain to study adult mouse gene
expression patterns in common strains.
Adult mouse gene expression patterns in common strains. Experiment Overall Design: six mouse strains
and seven brain regions were analyzed
E-GEOD-358 Transcription profiling of rat whole brain samples from animals with repeated exposure to
the anaesthetic isoflurane
12 Controls, 3 5-exposures, 3 10-exposures. Rats were exposed to 90 minutes of 1.0% isoflurane twice
a day for a total of 5 or 10 exposures. Animals did not require intubation. All exposures and
hybridizations were performed at the Univ. of Pennsylvania
12
Open Biomedical Annotator
13
Some Results
Two microarray experiments (E-GEOD4034, E-GEOD-4035) contain the following
set of terms: fear, hippocampus, mouse.
 These microarray experiments study the
role of hippocampus in fear using mouse
as the model.

14
Analysis tools
BioConductor
 GenePattern
 Genespring

15
Intercommunity collaboration
HCLS (BioRDF)
 MGED (ArrayExpress)
 NIF (NeuroLex)
 Ontology community (NCBO)

16
Web of silos
cel, gpr, etc
17
Semantic Web = Brilliant Web!
18
The End
19
Discussion








What is the RDF structure
Extension of SPARQL to empower data analysis
Workflow and provenance
Visualization
How to integrate database and literature
Integration of other types of data
Inter-community collaboration
Translational use cases
20
What should be the RDF structure?
Experiments
 Samples
 Experimental conditions/factors
 Gene lists
 Arrays/chips
 Raw/processed data (e.g., CEL, GPR,
gene matrix)

21
Extension of SPARQL
Hierarchical queries
 Statistical analyses/tests
 Enrichment analysis

22
Workflow and provenance
Taverna
 Biomoby
 Genepattern

23
Visualization
Cytoscape
 TreeView

24
How to integrate database and
literature
25
Inter-community Collaboration
NCBO
 SWAN

26
What other types of data can be
integrated with microarray data
27
Translational use cases
28