HCLSIG_BioRDF_Subgroup$W3C

Download Report

Transcript HCLSIG_BioRDF_Subgroup$W3C

Semantic Web Development in
Traditional Chinese Medicine
Huajun Chen
Zhejiang Unviersity
What’s TCM?
TCM Semantic Web
TCM Ontology Engineering
TCM Semantic Search Engine (DartGrid System)
Semantic Graph Mining for biomedical network analysis
What’s TCM
What’s TCM?
 Traditional Chinese Medicine (TCM) is an ancient medical
system that accounts for around 40% of all health cares
delivered in China.
 Preventive Medicine
 Take medicine as like a daily nutrition supplement or part of food to
maintain the balance of the whole body system.
 Personalized Medicine
 Treatment can be completely different for people with respect to their
gender, age, health condition although they have very similar symptoms.
 Empirical Medicine
 The effect of many TCM drugs are based on more one thousand years of
practices, whereas they do not know the specific underlying mechanism.
TCM Knowledge
 TCM theories derive from many knowledge sources
including the theories of Yin-Yang, Chinese five elements, the
human body channel system, Zang Fu organ theory, holistic
connections, mind-body intervention, and many others.
 TCM practice includes diagnosis and treatments theories
such as herbal medicine and , massage and cupping,
acupuncture and meridians.
TCM Semantic Web Project
A project in collaboration with China Academy of Traditional
Chinese Medicine.
The ultimate vision of the TCM Semantic
Web
The Subprojects
 TCM Ontology Engineering (2001-current).
 The DartGrid Data Integration System (first started in 2002)
 Integrating legacy relational database into Semantic Web
 DartMapper: Visulized relational-2-RDF Mapper (2003-2005)
 DartQuery: SPARQL2SQL Query Rewriter and a Form-based SPARQL query
builder (2003-2006)
 DartSearch: Semantic Search. (2005-current)
 Semantic Data Analysis and Data Mining for Semantic Web
 DartSpora: semantic data analysis engine (2007-current)
 Semantic Graph Mining for biomedical network analysis. (2007-
current)
TCM Ontology Engineering
TCM Ontology Engineering
 A effort participated by
more than 100 persons
from over 30 TCM
research institutes located
in different parts of China
 Scale
 More than 20,000 classes and
100,000 instances defined in the
current ontology
 Service
Web APIs for ontology-based
applications.
The current TCM ontology contains 15 major
categories for each sub-domain.
Ontology visualization and query engine
TCM Semantic Search Engine
A semantic search engine build upon a lot of relational databases.
Search Service supports full-text
System Architecture
search in all databases, and
semantically navigating through the
Ontology Service is used
exposedatabase
the
result,toacross
boundaries.
RDF/OWL
ontologies.
Semantic
Query Service is used to process SPARQL
semantic queries.
Semantic Registration Service maintains the semantic
mapping information.
Visualized Mapper
Semantic Search Portal Version 1
Semantic Search Portal Version 1
Semantic Data Analysis for TCM
What kinds of new connections can be discovered or
mined from this huge web of data?
Graph vs Semantic Graph
Conventional
Graph Model
Semantic Graph Model
Node
Semantic Graph as a
All nodes
areBase
identical
Knowledge
Semantic Graph as a
Nodes
areNetwork
labeled,
Complex
Edge
All edges are identical Edges are labeled, different
Reasoning
Basic
Nodes stand for
Element entities
Network
Analysis
statement
RDF
for facts.
Semantic Graph
Mining
different
stands
An example.
A semantic graph can connect data
from different sources and domains
while preserving the provenance of
data.
An example
 Frequent Semantic Sub-graph Discovery
 Problem Descriptions:
 Semantic Sub-graph. In a semantic graph G, every transaction can be
represented as a knowledge base consisting of statements. One graph A is
a sub-graph of graph B iff. A is subsumed by B.
 Frequent Semantic Sub-Graph. Give a graph g, and a semantic graph G. g
is a frequent sub-graph with respect to G, iff. there are more than i|K|
minimum subsumed sub-graphs in G with respect to g, where i is a userspecified minimum support threshold, and |K| is the total number of
graphs in K.
 Applicatisions:
 Network motifs identification in biological networks
 Drug Efficacy Analysis
Semantic data analysis
 Semantic graph contains richer information than normal
graph.
 It is based upon the integration capability of semantic web.
 Much more meaningful mining results:
 Discover the facts directly.
 Find more meaningful associations among entities.
 Calculate the network parameters in a more accurate way.
 Ontological reasoning can be leveraged to further facilitate
the mining process.
 We need good tools to help do so.
DartSpora: a interactive mining engine for
TCM
Summary
 A Web of Data means a lot to us.
 It can enable fancy ways of searching and browsing the daunting
online information space.
 It can also finally unleash the potential underlying disparate data
sources to greatly facilitate and advance the data mining and
knowledge discovery technology.
 But we need powerful tools to help us to achieve the goal.
Summary:
Key Benefits of Semantic Web for TCM
 Fusion of data across many scientific discipline
 Easier recombination of data
 Querying of data at different levels of granularity
 Capture provenance of data through annotation
 Data can be assessed for inconsistencies
 Integrative knowledge discovery from large-scale
semantic graph formed by integrating cross-institutional,
cross-dispinaries data sources.
Thanks for your time!