Parkinson`s Disease Ontology
Download
Report
Transcript Parkinson`s Disease Ontology
Parkinson’s Disease Ontology
Outline
• Use Case:
– Parkinson’s Disease
• Seed Ontology
• Design Issues
• Extending the seed ontology
• Next Steps
Use Case: Parkinson’s Disease
• Description of Parkinson’s Disease from different perspectives:
–
–
–
–
–
–
–
Systems Physiology View
Cellular and Molecular Biologist View
Clinical Researcher View
Clinical Guideline Formulator View
Clinical Decision Support Implementer View
Primary Care Clinical View
Neurologist View
• Identify Information Needs of the stakeholders identified above
• Available at:
– http://esw.w3.org/topic/HCLS/ParkinsonUseCase
• Developed by:
– Don Doherty
– Ken Kawamato
Use Case: Systems Physiology View
What chemicals (neurotransmitters) are used by each circuit
element (neuron) to communicate with the next element
(neuron)? What responses do they elicit in the neurons?
Use Case: Cellular and Molecular
Biologist View
What proteins are implicated in Parkinson's disease? How are
protein expression patterns, protein processing, folding,
regulation, transport, protein-protein interactions, protein
degradation, etc. affected?
Use Case: Clinical Researcher View
Can a certain diagnostic test (e.g., a blood test for a
biomarker or an imaging study) provide an approach
to diagnosing Parkinson’s disease that is superior to
or can complement existing diagnostic approaches?
Use Case: Clinical Guideline
Formulator View
What have been the results of clinical trials that have evaluated
the benefits and costs associated with diagnostic or therapeutic
interventions for Parkinson’s disease?
Use Case: Clinical Decision
Support Implementer View
Which clinical guideline(s) should be used as the basis for
implementing the CDS functionality?
Use Case: Primary Care Clinician View
If a patient is not currently diagnosed with Parkinson’s disease, do
the patient’s current symptoms indicate the need for a referral
to a neurologist for further evaluation? If so, what are the
referral criteria?
Use Case: Neurologist View
What is the differential diagnosis for this patient given his/her
symptoms, signs, and diagnostic test results?
First Phase
• Focus on the Cellular and Molecular Biologist View
• Develop Parkinson’s Disease Ontology based on that View
• Refine it iteratively
• Augment it with other views later
Parkinson’s Disease Revisited
Studies identifying genes involved with Parkinson's disease are rapidly outpacing the cell biological studies
which would reveal how these gene products are part of the disease process in Parkinson's disease. The
alpha synuclein and Parkin genes are two examples.
The discovery that genetic mutations in the alpha synuclein gene could cause Parkinson's disease in families
has opened new avenues of research in the Parkinson's disease field. When it was also discovered that synuclein
was a major component of Lewy bodies, the pathological hallmark of Parkinson's disease in the brain, it became
clear that synuclein may be important in the pathogenesis of sporadic Parkinson's disease as well as rare cases of
familiar Parkinson's disease. More recently, further evidence for the intrinsic involvement of synuclein in Parkinson's
disease pathogenesis was shown by the finding that the synuclein gene may be triplicated or duplicated in familiar
Parkinson's disease, suggesting that simple overexpression of the wild type protein is sufficient to cause disease.
Since the discovery of synuclein, studies of genetic linkages, specific genes, and their associated coded proteins are
ongoing in the Parkinson's disease research field - transforming what had once been thought of as a purely
environmental disease into one of the most complex multigenetic diseases of the brain.
Studies of genetic linkages, specific genes, and their associated coded proteins are ongoing in the Parkinson's
disease research field. Mutations in the Parkin gene cause early onset Parkinson's disease, and the parkin protein
has been identified as an E3 ligase, suggesting a role for the proteasomal pathway of protein degradation
in Parkinson's disease. DJ-1 and PINK-1 are proteins related to mitochondrial function in neurons, providing an
interesting genetic parallel to mitochondrial toxin studies that suggest disruptions in cellular energetics and oxidative
metabolism are primarily responsible for Parkinson's disease. Other genes, such as UCHL-1, tau, and the
glucocerebrosidase gene, may be genetic risk factors, and their potential role in the sporadic Parkinson's disease
population remains unknown. Mutations in LRRK2, which encodes for a protein called dardarin, is the most recently
discovered genetic cause of Parkinson's disease, and LRRK2 mutations are likely to be the largest cause of familial
Parkinson's disease identified thus far. Dardarin is a large complex protein, which has a variety of structural moieties
that could be participating in more than a dozen different cellular pathways in neurons. Because the cellular pathways
that lead to Parkinson's disease are not fully understood, it is currently unknown, how, or if, any of these pathways intersect in
Parkinson's disease pathogenesis.
Step 1: Identify concepts and subsumption
hierarchies
Step 2: Identify relationships
Step 3: Look at Information Queries
What cell signaling pathways
are implicated in the pathogenesis
of Parkinson’s disease?
In which cells?
What proteins are involved
in which pathways?
Design Issues: Modeling
• Modeling as relationships vs classes
– E.g., UHCL-1 transcribed_into Dardarin, vs
– Define a class called transcription as follows:
– Transcription
• has_gene: UHCL-1
• has_protein: Dardarin
• Modeling a Disease as a dynamic process as opposed to a
static class
Design Issues: Instance vs SubClass
• A generic/specific relationship can be modeled either using
instance-of vs subclass-of, for e.g.
– Parkinson’s Disease subclassof Disease vs Parkinson’s Disease
instance-of Disease
– UHCL-1 subclass-of Gene vs UHCL-1 instance-of Gene
– Synuclein subclass-of Protein vs Synuclein instance-of Gene
• What are the performance impact of these relationships?
– Instance-of involves ABox reasoning
– Subclass-of involved TBox reasoning
– Is one more scalable than the other?
• What is the impact on expressivity?
– Can “more” knowlledge be represented using one over the other?
Design Issue: Granularity
• At what level of specificity should relationships be represented
in the ontology?
– AllelicVariant causes Disease, vs
– LRR2KVariant causes Parkinson’s Disease
• At what level of genericity should relationships be represented
in the ontology?
– LewyBody hallmark_of Parkinson’s Disease, vs
– AnatomicalEntity hallmark_of Disease
Design Issue: Uncertainty
• “The discovery that genetic mutations in the alpha synuclein
gene could cause Parkinson's disease in families”
• The OWL/RDF metamodels do not support expressing this
information.
• What could be ways of expressing these?
– Using reification in RDF?
– Introducing new relationships in OWL?
• What impact would this have on:
– Data Integration?
– Reasoning?
Design Issue: Domain/Range Polymorphism
• What are the semantics of multiple domains and ranges?
–
–
–
–
–
Property: associated_with
domain: Pathway
domain: Protein
range: Cell
range: Biomarker
• Are RDF/OWL Semantics good enough for us?
• Do we need remodel relationships to avoid this?
• Different types of polymorphic relationships:
– Sub-type polymorphism
– Ad-hoc polymorphism
Design Issue: Default Values
• How do we handle default values of OWL properties
• Example:
– Default function of proteosomal pathway is protein degradation
• What is the impact of default values on biomedical data
integration? Reasoning?
Design Issue: Ontology Inclusion
• Cross-linking to other ontologies such as GO, Neuronames,
etc.
• If we “link” to a class or property in another ontology:
– Should we include associated sub classes?
– Should we include associated properties?
– Should we include associated axioms?
• What if this leads to inconsistencies
– Cycles
– Contradictions
• How does this impact data integration or reasoning?
– Can we get by with “shallow” inclusion?
Ontology Modularization
• Mutually disjoint tree with “cross cutting” properties, axioms,
etc.
• Proposed by Alan Rector
• Example: Different hierarchies/lattices for
– Studies (e.g., publication in Pubmed)
– Biomedical knowledge referenced in those studies (e.g., association
between a gene and a disease)
Design Issue: Higher Order
Relationships
• Example
– Association between a Gene and a Disease
mentioned in a study
Creation of Best Practices
• Design issues have been the subject of investigation in the
Knowledge Engineering and Medical Informatics communities
• Different approaches to resolve these issues will be appropriate
in the context of different use cases.
• Goal:
– Propose various alternatives in the context of use cases proposed in
HCLSIG
Extending the Seed Ontology
• Identify concepts and properties inclusions from:
– Gene Ontology
– Neuro Names
• Decide the “level of inclusion”
Extending the Seed ontology
•
Look at statements from research articles to extend the
ontology
Example:
•
–
Aggresomes formed by alpha-synuclein and synphilin-1 are
cytoprotective.
–
Create a new property called formedBy
•
•
domain(formedBy) = Aggresome
range(formedBy) = Protein
subClassOf(
intersectionOf(Aggresome,
Restriction(formedBy, someValuesFrom(intersectionOf(alpha-synuclein, synphilin-1)))),
Restriction(function, hasValue(cytoprotective)
)
Next Steps
• Apply this ontology to demonstrate Parkinson’s
Disease Use Case
• Focus of the BIONT – BIORDF Collaborative
F2F