HCLS$$WWW2008$tyu

Download Report

Transcript HCLS$$WWW2008$tyu

Semantic Graph Mining for Biomedical
Network Analysis:
A Case Study in Traditional Chinese Medicine
Tong Yu
HCLS 2008
Content
•
•
•
•
•
The TCM Semantic Web
Semantic Query and Search Portal
Semantic Graph Mining Methods
TCM Use Cases
Key Benefits of Using Semantic Web
Technologies for TCM domain
Tcm Informatics: A cross-cultural and
interdisciplinary endeavor
• TCM Informatics aims at the
computerization of TCM
information and knowledge to
provide intelligent resources
for clinical decision-making,
drug discovery, and education.
• TCM Informatics is essentially
an interdisciplinary endeavor
involving Chinese Culture,
Healthcare and Life Sciences,
and Information Technology
• The cross-cultural and
interdisciplinary nature of TCM
Informatics requires data from
interrelated domains to be
connected and shared.
Approach Overview
•
We intend to connect the knowledge
systems of TCM and biomedicine to
facilitate cross-cultural information
retrieval and data analysis.
–
–
–
–
–
–
Engineer an ontology for TCM domain
Making associations between TCM
Ontology and Western Medicine
Ontology
The Semantic Web integrates
structured data from the territories of
TCM (Left) and Western Medicine
(Right)
Semantic Mediator maps relational
schemas into domain ontology by
defining Semantic Views.
Query-Rewriting Engine translates a
Sparql query into a series of SQL
queries based on mapping rules.
supports a variety of Web-based
applications.
The Architecture of
The TCM Semantic Web
The TCM Semantic Web portal
A Semantic Graph Model for TCM Domain
Semantic Graph Model can
connect data from
different TCM data
sources while preserving
the provenance of data.
We use TCM Ontology to
integrate data about EMR,
Formulae & Drugs,
Diseases, and to connect
the TCM data with
orthodoxy medicine data
e.g. UMLS, Gene
Ontology.
Interactive Mining of TCM
Knowledge
The Spora System perform interactive
knowledge discovery experiments on the
Semantic Web.
Semantic Graph Mining
Implement the semantic graph mining
algorithms (importance calculation,
frequent pattern discovery, clustering,
etc. )
as generic operators that work on
top of the
Semantic Web layer and query semantic
graph models in Sparql.
KDD Experiments
Users can create an Experiment by specifying a
knowledge discovery process as a tree of operators
with customizable properties, and then execute the
process and review the results rendered as interactive
tables, histograms, etc.
Semantic Graph
Resource Importance
• the in-degree centrality CI of a resource is measured by the
weighted sum of statements with the resource as object, and the
out-degree centrality is measured by the weighted sum of
statements with the resource as subject.
Semantic Graph
Resource Importance
• The Closeness Centrality of a resource r is defined as the inverse of
the sum of the distance from r to all other resources.
Semantic Graph
Resource Importance
• The Betweenness Centrality of a resource r is defined as the ratio of
shortest paths across the resource in the graph.
Semantic Associations
•
pathAssociated
–
–
–
•
joinAssociated
–
–
–
•
<the prescription1 prescribes TCM
Formula FGD> AND <Formula FGD
cotains the Herb Glycyrrhizae>,
So that:
<prescription1 & Glycyrrhizae are
pathAssociated>.
<the prescription1 prescribes a
Formula FGD> to <treat the TCM
Syndrome KYD>,
So that:
<FGD & KYD are joinAssociated> with
the join point as the prescription1.
classPathAssociated
–
–
–
<the Glycyrrhizae is of type Herb>AND
<the Atractylodis is of type Drug>AND
<Herb is a subclass of Drug>,
So that:
<Glycyrrhizae and Atractylodis are –
cpAssociated>.
Frequent Semantic Subgraph
Frequent Semantic Subgraph
Pattern Interpretation
• Discovered patterns can be annotated with domain
knowledge based on semantic associations of concepts,
and visualized as a rich graph to facilitate human
interpretation. Here semantic search is used to discover
latent semantic associations of concepts.
Pattern Interpretation
• This example pattern including four herbs
and two drug efficacies, is interpreted by
the fact that the formula FGD composed of
these herbs has these two drug efficacies.
The Semantic Network of
herb-drug interactions
• The TCM domain involves a complex network of druginteractions.
• We use Traditional Chinese Medicine (TCM) information
resources to map an extensive view of Herb-Drug
Interactions.
• This network is mapped through semantic integration of
legacy relational databases in Traditional Chinese
Medicine (TCM) domain.
• This network is used for domain experts to rank
topologically-important herbs/drugs, to retrieve semantic
associations between drugs, and to discern interesting
patterns such as frequent sub-graphs and community
structures.
The Semantic Network of
herb-drug interactions: The process
•
•
•
•
•
•
Data Modeling: Represent domain
knowledge and facts in named
semantic graphs.
Data Transformation& Integration :
translate structured or semi-structured
data into semantic web languages.
Entity Disambiguation: 1344 CVDassociated herbs are identified.
Interaction Identification: The nature
and frequency of interactions between
all pairs of drugs are discovered
through semantic association.
A semantic network is generated by
inserting a statement for every
interaction to generate a global
semantic graph of diverse drug
interactions .
Clustering: Drug communities are
discovered through semantic graph
clustering.
Global network of frequent herb-drug interactions,
with drugs represented by nodes with size/font
proportional to degree, interactions represented by
edges, and drug communities represented by colors.
Key Benefits of Using Semantic
Web Technology
• Exposing of legacy data through a semantic layer so that
it can be more easily reused and recombined.
• Linking data across database boundaries so as to
enabling more intuitive query, search, and navigation
without the awareness of the boundaries.
• The ontology serves as the control vocabulary to make
semantic suggestions such as synonyms, related
concepts to facilitate query and search.
• Reasoning capability such as sub-classing, transitive
property can then be implemented at the semantic layer
to increase the query expressiveness so as to retrieve
more complete answers.
• Allows for more advanced data analysis and integrative
knowledge discovery based on the huge web of data.
Conclusion
• We took the first systematic approach to leverage the
progress of Biomedical Informatics to address the
modernization of TCM.
• Domain experts evaluate the platform’s major technical
features as original and productive in Drug Safety and
Efficacy analysis.
• This case study demonstrates the Semantic Web’s
advantages in representation, integration, and discovery
of knowledge with complex domain models.
• Contributes to the Preservation and Modernization of
TCM as intangible cultural heritage.