Ontology of Genetic Susceptibility Factors to Diabetes Mellitus

Download Report

Transcript Ontology of Genetic Susceptibility Factors to Diabetes Mellitus

Ontology of Genetic
Susceptibility Factors to
Diabetes Mellitus
(OGSF-DM)
Yu Lin, Norihiro Sakamoto
Department of Sociomedical Informatics,
Graduate School of Medicine, Kobe University
Agenda
What are Genetic Susceptibility Factors
(GSF) ?
 How do we confirm genetic susceptibility ?
 Why do we need an ontology ?
 The Ontology of Genetic Susceptibility
Factors to Diabetes Mellitus (OGSF-DM)

Methodology
 Testing


Discussion
2008/02
InterOntology08
2
Search “Genetic Susceptibility” in UMLS
2008/02
InterOntology08
3
Scope of “GSF to Diabetes Mellitus”
Those genetic characteristic and interaction
between genetic and environmental factors which
increase the probability to develop diabetes
mellitus (DM).
If “decrease”,
then
 polymorphism
“resistence”
 linked loci
 SNP
 haplotype
 genotype
2008/02
InterOntology08
4
Mendelian
Diease
VS
Complex
Disease
2008/02
InterOntology08
Ref: [Rioux JD, Abbas AK.] Paths to understanding the genetic basis of autoimmune disease.
Nature. 2005 Jun 2;435(7042):584-9. Review.
5
How to confirm the GSF



Through combined family-based linkage study
and population-based association study
Through a combined genetic (gene-by-gene
function-candidate) association approach with a
genome-wide association approach
Through combined statistical study with
biological function study
2008/02
InterOntology08
6
Factors Affecting Statistical Power
of Confirming GSF






Number of disease variants
Allele frequencies among population
Effect size on disease phenotype
 Odds Ratio (OR)
Population structure and geography
Selection bias
Genotype and phenotype misclassification errors
Ref: [ Wang WYS, et al.] Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 2005, 6:109-118.
2008/02
InterOntology08
7
No Criteria Established

There are no established criteria for confirming
GSF (Genetic Susceptibility Factors)



OR1.5-2.0 ?
sample size
population
Can we settle down this?
2008/02
InterOntology08
8
A Knowledge Base is Needed



The primary idea is to catalog all GSF to Diabetes
Mellitus (DM)
The reality of researches on GSF to DM
 Different levels of genetic object
 Different types of study design
 Inconsistent result
 Complex phenotypes of DM
Versatile datasets demand a knowledge base on this
topic
2008/02
InterOntology08
9
Ontology in General




Originally from philosophy
An ontology is “specification of a shared conceptualization” [Gruber
T.]
Ontology as an approach to “annotation of multiple bodies of
data”[Smith B. et al]
Widely used in computer science and information science
 artificial intelligence
 the Semantic Web
 software engineering
 biomedical informatics “Gene Ontology as a successful example”
 library science
 information architecture as a form of knowledge representation
Ref: http://en.wikipedia.org/wiki/Ontology_%28computer_science%29
2008/02
InterOntology08
10
Ontology is a Good Tool

In our case, ontology can help with:






Knowledge representation
Database design
Content-oriented analysis
Information retrieval and extraction
Information integration
By setting rules, can we establish a criteria to
demonstrate either the genetic susceptibility or
causality to complex disease?
2008/02
InterOntology08
11
Agenda
What are the Genetic Susceptibility
Factors (GSF)
 How do we confirm genetic susceptibility
 Why do we need an ontology
 The Ontology of Genetic Susceptibility
Factors to Diabetes Mellitus (OGSF-DM)

Methodology
 Testing


Discussion
2008/02
InterOntology08
12
The Methodology of OGSF-DM
specification
conceptualization
Specify the domain and scope
Build the conceptual model
integration
Reuse and import other ontologies
Implementation,
evaluation
Protégé 3.3.1, OWL , SWRL rules
2008/02
InterOntology08
13
Step1. Specification


Domain: Represent the knowledge of GSF to DM and
related phenotypes
Explore relevant literature resources:


PubMed: a corpus of 5873 abstracts (as on 31 Oct. 2007)
Books:



Joslin’s Diabetes Mellitus
Human Molecular Genetics 3
The most fundamental terms:





2008/02
i) Human disease: diabetes mellitus and related disorders;
ii) Phenotypes and observed quantity parameters;
iii) Genetic concepts;
iv) Geographical regions;
v) Disease gene study of the original paper.
InterOntology08
14
Step2. Conceptualization

The core conception generated by analyzing the titles
of the corpus

The conception shows an N-ary relationship
2008/02
InterOntology08
15
The top-level of OGSF-DM

Adopted terms from BFO (Basic Formal Ontology ):
Continuant,Occurrent, Independent_Continuant,
Dependent_Contiuant , Quality
2008/02
InterOntology08
16
The position of core concepts
2008/02
InterOntology08
17
CLASS: Observed_Relationship
• Class hierarchy
2008/02
• Constraints of class
InterOntology08
18
The term ‘Allele’ is polysemous


Genetics definition: an allele is either one of a pair (or
series) of alternative forms of a gene that can occupy
the same locus on a particular chromosome, and that
control the same character of the phenotype.
(http://www.thefreedictionary.com/allele)
“Allele” appeared in different resources:
Meaning of Allele
Appeared Form
Resource
the variant of gene in
an individual
disease “allele”
original paper
representation of SNP
“allele/allele” in DNA, RNA and
amino acid level
HGVBase
allele sharing in sibs
IBS,IBD “allele”
linkage study
2008/02
InterOntology08
19
Allele CLASS in OGSF-DM



An abstraction
Currently, it satisfied the data model
Need to be refined in the future
2008/02
InterOntology08
20
Gene concept has evolved
1860s1900s
Gene as
a discrete
unit of
heredity
1910s
Gene as
ORF
sequence
pattern
Gene as
a physical
molecule
Gene as
a distinct
locus
1940s
Gene as
a blueprint
for a
protein
1950s
1960s
Gene as
transcribed
code
1970s1980s
Gene as
…
1990s2000s
2007-
Gene as
annotated
genomic
entity
Ref: [Gerstein MB, et al.] What is a gene, post-ENCODE? History and updated definition. Genome Research. 2007 Jun;17(6):669-81.
2008/02
InterOntology08
21
Some definitions of ‘gene’





Human Genome Nomenclature Organization:“a DNA segment that
contributes to phenotype/function. In the absence of demonstrated function
a gene may be characterized by sequence, transcription or homology”(Wain
et al. 2002)
Rat Genome Database : “the DNA sequence necessary and sufficient to
express the complete complement of functional products derived from a unit
of transcription ”(2003)
Sequence Ontology Consortium: “locatable region of genomic
sequence,corresponding to a unit of inheritance, which is associated with
regulatory regions, transcribed regions and/or other functional sequence
regions” (Pearson 2006).
ENCODE project Consortium: “The gene is a union of genomic
sequences encoding a coherent set of potentially overlapping functional
products.”(Gerstein et al.2008)
MeSH : genes are “Specific sequences of nucleotides along a molecule of
DNA (or, in the case of some viruses, RNA) which represent functional units
of HEREDITY. Most eukaryotic genes contain a set of coding regions
(EXONS) that are spliced together in the transcript, after removal of
intervening sequence (INTRONS) and are therefore labeled split genes. ”
2008/02
InterOntology08
22
Gene CLASS in OGSF-DM


A place holder
The instance of Gene is the name of the gene which
appears in the research paper
2008/02
InterOntology08
23
Step3. Integration

Importing two ontologies:

ontology of glucose metabolism disorders
A
slim OBO files was extracted from Human
Disease ontology
 OBO file was transfered to OWL file
 The class hierarchy was restructure new terms
from “Joslin’s Diabetes Mellitus” added

ontology of geographical regions
 Generated
by hand adopting the terms from
MeSH2008 “Geographic Locations[z01]”
2008/02
InterOntology08
24
Step4. Implementation and Evaluation

Protégé_3.3.1 + OWL

SWRL rule example:
hasPopulation-1 Rule
isObservedIn (?x, ?y) ∧ hasStudyPopulation(?y, ?z)
→ hasPopulation(?x, ?z)
to infer the population(z) of the
Obeserved_Relationship(x) ; y is a
Disease_Gene_Study.
2008/02
InterOntology08
25
The example article

Full text URL: http://diabetes.diabetesjournals.org/cgi/content/full/53/4/1134
2008/02
InterOntology08
26
Asserting individual 1)
1) associated_with_1 ⊆ Not_Stated_Resistance_or_Susceptibility_Association
⋂∀ hasSupportingEvidence ( ∋ {odds_ratio_OR_1.49 } )
⋂∃ isObservedIn ( ∋ {Disease_Genetic_Study_15047632})
⋂∃ isObservedRelationshipOf ( ∋ {a_3_intronic_SNP_rs3818247})
⋂∃ isRelationshipWith ( ∋ {Type_2_Diabetes_})
means that a 3’ intronic SNP rs3818247 is
associated with Type 2 Diabetes with a
supporting evidence of OR 1.49. The
relationship is an associated relationship, but is
stated to be neither a susceptibility nor a
resistance factor in this study.
2008/02
InterOntology08
27
Asserting individual 2),3),4)
2) odds_ratio_OR_1.49 ⊆ Odds_Ratio
⋂∀ hasOR ( ∋ {1.49} )
⋂∀ hasCI95 ( ∋ {1.15-1.90} )
⋂∃ hasP ( ∋ {Corrected_P_0.0252} ⋂ {Uncorrected_P_0.0028} )
⋂∃ hasClassifiedGroup ( ∋ {Control_Group_1} ⋂ {Case_Group_1} )
3) Control_Group_1 ⊆ Classified_Group
⋂∃hasPopulationSize ( ∋ {342 int})
⋂∀isPartOf ( ∋ {an_ashkenazi_jewish_population})
4) Case_Group_1 ⊆ Classified_Group
⋂∃hasPopulationSize ( ∋ {275 int})
⋂∀isPartOf ( ∋ {an_ashkenazi_jewish_population})
2), 3) and 4) together means that the study conducted a casecontrol study(case size =275 and control size = 342) in an
Ashkenazai Jewish population.
Result: Odds Ratio 1.49 (95%CI:1.15-1.90, corrected P = 0.0252,
uncorrected P = 0.0028).
2008/02
InterOntology08
28
Asserting individual 5)
5) Disease_Gene_Study_15047632 ⊆ Disease_Gene_Study
⋂∀ hasPubMedID ( ∋ {PMID_15047632}
⋂∃ hasStudyPopulation ( ∋ {an_ashkenazi_jewish_population})
⋂∀ hasURI ( ∋ {http://diabetes.diabetesjournals.org/cgi/content/full/53/4/1134})
6) an_ashkenazi_jewish_population ⊆ Population_Group
⋂∃hasPopulationCharacteristic (∋ {Jews} )
⋂∃hasGeographicalSite ((∋ {Israel} ⋂ {U.S.} )
5) and 6) means :
① An Ashkenazi Jewish population was investigated in this
study;
② The population belongs to Jews ethinic group and
located in Israel and U.S. ;
③ the PubMedID and URL of this paper were collected.
2008/02
InterOntology08
29
The core conception

Put 1)-5) together, the core conception of
this one relationship is built:
relationships { associated } between the { 3_intronic_SNP_rs3818247} and
{Type_2_Diabetes} observed in a { an_ashkenazi_jewish_population } from a study
{ PMID_15047632}.
2008/02
InterOntology08
30
Representation of a SNP
a_3_intronic_SNP_rs3818247 ⊆ htSNP
⋂∃ hasAlleleComponent ( ∋ {DNA_Level_Allele_T} ⋂ { DNA_Level_Allele_G})
⋂∃ hasGenomeSite ( ∋ {flanking_3_intronic})
⋂∃ isGeneticVariantOf ( ∋ {hepatocyte_nuclear_factor-4_alpha})
⋂∃ hasVariantDatabase ( ∋ {HGVBase_SNP002310533} ⋂{dbSNP_rs3818247})
This means that the 3’ intronic SNP rs3818247 is a htSNP of
hepatocyte nuclear factor 4 alpha, located in the flanking 3’ intronic
sequence of the gene. The alleles of this SNP are T/G in DNA level.
Reference databases entry :
1) HGVBase : “SNP002310533”
2) dbSNP : “rs3818247”
2008/02
InterOntology08
31
Discussion




A hybrid of middle-out and top-down approach was
conducted to build our ontology.
BFO is important for harmonizing the domain ontologies
in our case.
The ontology can apply to other complex diseases too.
We anticipate the further application of this ontology:




2008/02
Information retrieval
Knowledge base development
Logic rules establishing
Mapping or link to other ontologies, such as GO, Mammalian
Phenotype, and so on.
InterOntology08
32