Ontology of Genetic Susceptibility Factors to Diabetes Mellitus
Download
Report
Transcript Ontology of Genetic Susceptibility Factors to Diabetes Mellitus
Ontology of Genetic
Susceptibility Factors to
Diabetes Mellitus
(OGSF-DM)
Yu Lin, Norihiro Sakamoto
Department of Sociomedical Informatics,
Graduate School of Medicine, Kobe University
Agenda
What are Genetic Susceptibility Factors
(GSF) ?
How do we confirm genetic susceptibility ?
Why do we need an ontology ?
The Ontology of Genetic Susceptibility
Factors to Diabetes Mellitus (OGSF-DM)
Methodology
Testing
Discussion
2008/02
InterOntology08
2
Search “Genetic Susceptibility” in UMLS
2008/02
InterOntology08
3
Scope of “GSF to Diabetes Mellitus”
Those genetic characteristic and interaction
between genetic and environmental factors which
increase the probability to develop diabetes
mellitus (DM).
If “decrease”,
then
polymorphism
“resistence”
linked loci
SNP
haplotype
genotype
2008/02
InterOntology08
4
Mendelian
Diease
VS
Complex
Disease
2008/02
InterOntology08
Ref: [Rioux JD, Abbas AK.] Paths to understanding the genetic basis of autoimmune disease.
Nature. 2005 Jun 2;435(7042):584-9. Review.
5
How to confirm the GSF
Through combined family-based linkage study
and population-based association study
Through a combined genetic (gene-by-gene
function-candidate) association approach with a
genome-wide association approach
Through combined statistical study with
biological function study
2008/02
InterOntology08
6
Factors Affecting Statistical Power
of Confirming GSF
Number of disease variants
Allele frequencies among population
Effect size on disease phenotype
Odds Ratio (OR)
Population structure and geography
Selection bias
Genotype and phenotype misclassification errors
Ref: [ Wang WYS, et al.] Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 2005, 6:109-118.
2008/02
InterOntology08
7
No Criteria Established
There are no established criteria for confirming
GSF (Genetic Susceptibility Factors)
OR1.5-2.0 ?
sample size
population
Can we settle down this?
2008/02
InterOntology08
8
A Knowledge Base is Needed
The primary idea is to catalog all GSF to Diabetes
Mellitus (DM)
The reality of researches on GSF to DM
Different levels of genetic object
Different types of study design
Inconsistent result
Complex phenotypes of DM
Versatile datasets demand a knowledge base on this
topic
2008/02
InterOntology08
9
Ontology in General
Originally from philosophy
An ontology is “specification of a shared conceptualization” [Gruber
T.]
Ontology as an approach to “annotation of multiple bodies of
data”[Smith B. et al]
Widely used in computer science and information science
artificial intelligence
the Semantic Web
software engineering
biomedical informatics “Gene Ontology as a successful example”
library science
information architecture as a form of knowledge representation
Ref: http://en.wikipedia.org/wiki/Ontology_%28computer_science%29
2008/02
InterOntology08
10
Ontology is a Good Tool
In our case, ontology can help with:
Knowledge representation
Database design
Content-oriented analysis
Information retrieval and extraction
Information integration
By setting rules, can we establish a criteria to
demonstrate either the genetic susceptibility or
causality to complex disease?
2008/02
InterOntology08
11
Agenda
What are the Genetic Susceptibility
Factors (GSF)
How do we confirm genetic susceptibility
Why do we need an ontology
The Ontology of Genetic Susceptibility
Factors to Diabetes Mellitus (OGSF-DM)
Methodology
Testing
Discussion
2008/02
InterOntology08
12
The Methodology of OGSF-DM
specification
conceptualization
Specify the domain and scope
Build the conceptual model
integration
Reuse and import other ontologies
Implementation,
evaluation
Protégé 3.3.1, OWL , SWRL rules
2008/02
InterOntology08
13
Step1. Specification
Domain: Represent the knowledge of GSF to DM and
related phenotypes
Explore relevant literature resources:
PubMed: a corpus of 5873 abstracts (as on 31 Oct. 2007)
Books:
Joslin’s Diabetes Mellitus
Human Molecular Genetics 3
The most fundamental terms:
2008/02
i) Human disease: diabetes mellitus and related disorders;
ii) Phenotypes and observed quantity parameters;
iii) Genetic concepts;
iv) Geographical regions;
v) Disease gene study of the original paper.
InterOntology08
14
Step2. Conceptualization
The core conception generated by analyzing the titles
of the corpus
The conception shows an N-ary relationship
2008/02
InterOntology08
15
The top-level of OGSF-DM
Adopted terms from BFO (Basic Formal Ontology ):
Continuant,Occurrent, Independent_Continuant,
Dependent_Contiuant , Quality
2008/02
InterOntology08
16
The position of core concepts
2008/02
InterOntology08
17
CLASS: Observed_Relationship
• Class hierarchy
2008/02
• Constraints of class
InterOntology08
18
The term ‘Allele’ is polysemous
Genetics definition: an allele is either one of a pair (or
series) of alternative forms of a gene that can occupy
the same locus on a particular chromosome, and that
control the same character of the phenotype.
(http://www.thefreedictionary.com/allele)
“Allele” appeared in different resources:
Meaning of Allele
Appeared Form
Resource
the variant of gene in
an individual
disease “allele”
original paper
representation of SNP
“allele/allele” in DNA, RNA and
amino acid level
HGVBase
allele sharing in sibs
IBS,IBD “allele”
linkage study
2008/02
InterOntology08
19
Allele CLASS in OGSF-DM
An abstraction
Currently, it satisfied the data model
Need to be refined in the future
2008/02
InterOntology08
20
Gene concept has evolved
1860s1900s
Gene as
a discrete
unit of
heredity
1910s
Gene as
ORF
sequence
pattern
Gene as
a physical
molecule
Gene as
a distinct
locus
1940s
Gene as
a blueprint
for a
protein
1950s
1960s
Gene as
transcribed
code
1970s1980s
Gene as
…
1990s2000s
2007-
Gene as
annotated
genomic
entity
Ref: [Gerstein MB, et al.] What is a gene, post-ENCODE? History and updated definition. Genome Research. 2007 Jun;17(6):669-81.
2008/02
InterOntology08
21
Some definitions of ‘gene’
Human Genome Nomenclature Organization:“a DNA segment that
contributes to phenotype/function. In the absence of demonstrated function
a gene may be characterized by sequence, transcription or homology”(Wain
et al. 2002)
Rat Genome Database : “the DNA sequence necessary and sufficient to
express the complete complement of functional products derived from a unit
of transcription ”(2003)
Sequence Ontology Consortium: “locatable region of genomic
sequence,corresponding to a unit of inheritance, which is associated with
regulatory regions, transcribed regions and/or other functional sequence
regions” (Pearson 2006).
ENCODE project Consortium: “The gene is a union of genomic
sequences encoding a coherent set of potentially overlapping functional
products.”(Gerstein et al.2008)
MeSH : genes are “Specific sequences of nucleotides along a molecule of
DNA (or, in the case of some viruses, RNA) which represent functional units
of HEREDITY. Most eukaryotic genes contain a set of coding regions
(EXONS) that are spliced together in the transcript, after removal of
intervening sequence (INTRONS) and are therefore labeled split genes. ”
2008/02
InterOntology08
22
Gene CLASS in OGSF-DM
A place holder
The instance of Gene is the name of the gene which
appears in the research paper
2008/02
InterOntology08
23
Step3. Integration
Importing two ontologies:
ontology of glucose metabolism disorders
A
slim OBO files was extracted from Human
Disease ontology
OBO file was transfered to OWL file
The class hierarchy was restructure new terms
from “Joslin’s Diabetes Mellitus” added
ontology of geographical regions
Generated
by hand adopting the terms from
MeSH2008 “Geographic Locations[z01]”
2008/02
InterOntology08
24
Step4. Implementation and Evaluation
Protégé_3.3.1 + OWL
SWRL rule example:
hasPopulation-1 Rule
isObservedIn (?x, ?y) ∧ hasStudyPopulation(?y, ?z)
→ hasPopulation(?x, ?z)
to infer the population(z) of the
Obeserved_Relationship(x) ; y is a
Disease_Gene_Study.
2008/02
InterOntology08
25
The example article
Full text URL: http://diabetes.diabetesjournals.org/cgi/content/full/53/4/1134
2008/02
InterOntology08
26
Asserting individual 1)
1) associated_with_1 ⊆ Not_Stated_Resistance_or_Susceptibility_Association
⋂∀ hasSupportingEvidence ( ∋ {odds_ratio_OR_1.49 } )
⋂∃ isObservedIn ( ∋ {Disease_Genetic_Study_15047632})
⋂∃ isObservedRelationshipOf ( ∋ {a_3_intronic_SNP_rs3818247})
⋂∃ isRelationshipWith ( ∋ {Type_2_Diabetes_})
means that a 3’ intronic SNP rs3818247 is
associated with Type 2 Diabetes with a
supporting evidence of OR 1.49. The
relationship is an associated relationship, but is
stated to be neither a susceptibility nor a
resistance factor in this study.
2008/02
InterOntology08
27
Asserting individual 2),3),4)
2) odds_ratio_OR_1.49 ⊆ Odds_Ratio
⋂∀ hasOR ( ∋ {1.49} )
⋂∀ hasCI95 ( ∋ {1.15-1.90} )
⋂∃ hasP ( ∋ {Corrected_P_0.0252} ⋂ {Uncorrected_P_0.0028} )
⋂∃ hasClassifiedGroup ( ∋ {Control_Group_1} ⋂ {Case_Group_1} )
3) Control_Group_1 ⊆ Classified_Group
⋂∃hasPopulationSize ( ∋ {342 int})
⋂∀isPartOf ( ∋ {an_ashkenazi_jewish_population})
4) Case_Group_1 ⊆ Classified_Group
⋂∃hasPopulationSize ( ∋ {275 int})
⋂∀isPartOf ( ∋ {an_ashkenazi_jewish_population})
2), 3) and 4) together means that the study conducted a casecontrol study(case size =275 and control size = 342) in an
Ashkenazai Jewish population.
Result: Odds Ratio 1.49 (95%CI:1.15-1.90, corrected P = 0.0252,
uncorrected P = 0.0028).
2008/02
InterOntology08
28
Asserting individual 5)
5) Disease_Gene_Study_15047632 ⊆ Disease_Gene_Study
⋂∀ hasPubMedID ( ∋ {PMID_15047632}
⋂∃ hasStudyPopulation ( ∋ {an_ashkenazi_jewish_population})
⋂∀ hasURI ( ∋ {http://diabetes.diabetesjournals.org/cgi/content/full/53/4/1134})
6) an_ashkenazi_jewish_population ⊆ Population_Group
⋂∃hasPopulationCharacteristic (∋ {Jews} )
⋂∃hasGeographicalSite ((∋ {Israel} ⋂ {U.S.} )
5) and 6) means :
① An Ashkenazi Jewish population was investigated in this
study;
② The population belongs to Jews ethinic group and
located in Israel and U.S. ;
③ the PubMedID and URL of this paper were collected.
2008/02
InterOntology08
29
The core conception
Put 1)-5) together, the core conception of
this one relationship is built:
relationships { associated } between the { 3_intronic_SNP_rs3818247} and
{Type_2_Diabetes} observed in a { an_ashkenazi_jewish_population } from a study
{ PMID_15047632}.
2008/02
InterOntology08
30
Representation of a SNP
a_3_intronic_SNP_rs3818247 ⊆ htSNP
⋂∃ hasAlleleComponent ( ∋ {DNA_Level_Allele_T} ⋂ { DNA_Level_Allele_G})
⋂∃ hasGenomeSite ( ∋ {flanking_3_intronic})
⋂∃ isGeneticVariantOf ( ∋ {hepatocyte_nuclear_factor-4_alpha})
⋂∃ hasVariantDatabase ( ∋ {HGVBase_SNP002310533} ⋂{dbSNP_rs3818247})
This means that the 3’ intronic SNP rs3818247 is a htSNP of
hepatocyte nuclear factor 4 alpha, located in the flanking 3’ intronic
sequence of the gene. The alleles of this SNP are T/G in DNA level.
Reference databases entry :
1) HGVBase : “SNP002310533”
2) dbSNP : “rs3818247”
2008/02
InterOntology08
31
Discussion
A hybrid of middle-out and top-down approach was
conducted to build our ontology.
BFO is important for harmonizing the domain ontologies
in our case.
The ontology can apply to other complex diseases too.
We anticipate the further application of this ontology:
2008/02
Information retrieval
Knowledge base development
Logic rules establishing
Mapping or link to other ontologies, such as GO, Mammalian
Phenotype, and so on.
InterOntology08
32