HCLSIG_BioRDF_Subgroup$$QueryFederation2

Download Report

Transcript HCLSIG_BioRDF_Subgroup$$QueryFederation2

BioRDF Update
Kei Cheung, Ph.D.
Yale Center for Medical Informatics
CSHALS 2010: HCLS Tutorial, Boston, February 23, 2010
Current participants
•
•
•
•
•
•
•
•
•
•
•
Kei Cheung (Yale University)
Helena Deus (University of Texas)
Don Doherty (Brainstage)
Rob Frost (Vector C)
Scott Marshall (University of Amsterdam)
Michael Miller (Teranode)
Adrian Paschke (Freie Universitat Berlin)
Eric Prud'hommeaux (W3C)
Satya Sahoo (Wright State University)
Matthias Samwald (DERI and Konrad Lorenz Institute)
Jun Zhao (Oxford University)
Current tasks
• Query Federation
– Semantic integration of neuroscience microarray data and
related data
– Expansion of previous query federation work (Cheung et
al. A journey to semantic web query federation in the life
sciences. BMC Bioinformatics. 10(Suppl 10):S10, 2009)
• Traditional Chinese Medicine (TCM)
– Collaboration with LODD
– Linking TCM data and other types of data including drug
data
Query federation in the context of
neuroscience microarray data
Gene expression in the neurosciences
• Gene expression may be an indicator of how well
someone is aging
• The New England Centenarian Study
• DNA microarray technology allows scientists to scan
tens of thousands of genes from a single sample at a
time and then link them to specific biological functions
Microarray examples/use cases
• NIH Neuroscience Microarray Consortium and
EBI ArrayExpress
• RDF representation of experiment metadata
and gene lists including provenance
Representative concepts
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Disease (e.g., AD, PD)
Neuron (e.g., dopamine neuron)
Brain region (e.g., hippocampus, posterior cingulate cortex, visual cortex)
Brain function (e.g., unimodal and heteromodal sensory association)
Organism (e.g., human)
Experimental factor (e.g., normal vs. AD)
Sample extraction method (e.g., laser-capture microdissection)
Proteins (e.g., NFT)
Genes (e.g., gene lists)
Biological process (e.g., energy metabolism)
Cellular component (e.g., mitochondrial electron transport chain)
Age
Disease state
Treatment
Approach
• Reuse existing ontological terms and
relationships (e.g., OBO Relation Ontology)
• Use Provenir ontology and aTags
• Create RDF representation of gene lists
RDF representation of gene expression lists
Genelist 1
Genelist 2
Genelist 3
Gene-specific annotation
Sample-specific gene expression values
Aggregated gene expression values
RDF graph
probeid
“Entorhinal Cortex”
Brain_region
gene1
“225871_at”
symbol
“STEAP2”
“AD”
Disease_status
“Human”
sample1
name
“six transmembrane epithelial
antigen of prostate 2”
Organism
value
Expression_value_for_gene1_sample1_pair
503.7
context
“Signal”
value
Expression_value_for_gene1_sample1_pair
“P”
context
“Detection”
Live to 100!
Featured Blog (from Ask Dr. Mao)
Feeling like the absent-minded professor lately?
Ginkgo, the oldest surviving species of tree, has been traced back 300 million years
and is one of the most widely studied plants. The leaf of the ginkgo tree is shaped
like a human brain, and some believe this is why, in Asia, it has always had a reputation
of benefiting the mental processes. A dwindling memory and decreased concentration
is largely caused by decreased blood flow to the brain and loss of brain cells; ginkgo
has been confirmed to boost circulation to the brain and other organs, improving
memory and cognitive functions. Additionally, ginkgo is used far and wide as a longevity
tonic in Asia and Europe. The best-known and most commonly available form of ginkgo
is as teas and herbal extracts, but ginkgo nut, used in the culinary traditions of Asian
cultures, also has therapeutic properties and is also said to strengthen lung function.
TCM project milestone
• Collaboration between BioRDF and LODD
• A paper was recently submitted to BMC Chinese Medicine
(Thematic Series: Semantic Web for Chinese Medicine)
• Samwald et al. Integrating findings from traditional medicine
into modern pharmaceutical research through semantic
technologies
– Linking a variety of data involving herbs that have been studied in
terms of their potential therapeutic effects on depression
• Data sources: TCMGeneDIT, PubMed, DBPedia, PharmGKB
• Semantic Web technologies: aTag (including an aTAG
explorer), RDFa, and SPARQL endpoint
Future plan
• Federate microarray data with other types of
data including data stored in HCLS KB’s (e.g.,
pathway data and disease/phenotype data)
• Explore a range of federated queries
• Demos (e.g., iPhone application)
• Expand the TCM project
The End