Dynamic Cataloging of the Autism Phenome: Using Structured

Download Report

Transcript Dynamic Cataloging of the Autism Phenome: Using Structured

An Ontology-Based Approach for
Computational Phenomics:
Application to Autism Spectrum
Disorder
Amar K. Das, MD, PhD
Departments of Medicine and
of Psychiatry and Behavioral Sciences
Stanford Center
for Biomedical Informatics Research
Outline




Motivations
NDAR project
Phenologue project
Future Directions
NCBO Webinar
October 7, 2009
Motivation
Psychiatric Genetics
Phenotyping
Terminology
Ontology
Logic
NCBO Webinar
October 7, 2009
Hasler G,et al. Toward constructing
an endophenotype strategy for
bipolar disorders. Biological
Psychiatry (2006)
Represent
findings and
their links
using
structured
knowledge
NCBO Webinar
October 7, 2009
Phenomics
“A primary task for the new field of
phenomics will be to clarify what, in
practical terms, constitutes a
phenotype and then to delineate the
different phenotypic components that
compose the phenome.”
Freimer & Sabatti, Nature Genetics (2003)
NCBO Webinar
October 7, 2009
OMIM
NCBO Webinar
October 7, 2009
dbGaP
Mailman, M.D. Nature Genetics (2007)
NCBO Webinar
October 7, 2009
PhenoWiki
NCBO Webinar
October 7, 2009
PhenoWiki
NCBO Webinar
October 7, 2009
Current Approaches



Lack of standardization
Lack of organization
Lack of computability
NCBO Webinar
October 7, 2009
Autism DSM-IV Diagnosis
A total of six (or more) items from (1), (2), and (3), with at least
two from (1), and one each from (2) and (3)
(1) qualitative impairment in social interaction, as manifested by
at least two of the following:
a) marked impairments in the use of multiple nonverbal behaviors
such as eye-to-eye gaze, facial expression, body posture, and
gestures to regulate social interaction
b) failure to develop peer relationships appropriate to
developmental level
c) a lack of spontaneous seeking to share enjoyment, interests, or
achievements with other people, (e.g., by a lack of showing,
bringing, or pointing out objects of interest to other people)
d) lack of social or emotional reciprocity
NCBO Webinar
October 7, 2009
Autism DSM-IV Diagnosis
(2) qualitative impairments in communication as manifested
by at least one of the following:
a) delay in, or total lack of, the development of spoken
language (not accompanied by an attempt to compensate
through alternative modes of communication such as
gesture or mime)
b) in individuals with adequate speech, marked impairment
in the ability to initiate or sustain a conversation with others
c) stereotyped and repetitive use of language or
idiosyncratic language
d) lack of varied, spontaneous make-believe play or social
imitative play appropriate to developmental level
NCBO Webinar
October 7, 2009
Autism DSM-IV Diagnosis
(3) restricted repetitive and stereotyped patterns of
behavior, interests and activities, as manifested by at least two of
the following:
a) encompassing preoccupation with one or more stereotyped
and restricted patterns of interest that is abnormal either in
intensity or focus
b) apparently inflexible adherence to specific, nonfunctional
routines or rituals
c) stereotyped and repetitive motor mannerisms (e.g hand or
finger flapping or twisting, or complex whole body movements)
d) persistent preoccupation with parts of objects
Delays or abnormal functioning in at least one of the following
areas, with onset prior to age 3 years:(1) social interaction(2)
language as used
NCBO Webinar
October 7, 2009
NDAR (ndar.nih.gov)
NCBO Webinar
October 7, 2009
Goals of NDAR



Develop standards to promote metaanalyses and cross site research data
comparisons
Provide researchers access to useful
software tools and infrastructure
Promote the sharing of research data
relevant to ASD
NCBO Webinar
October 7, 2009
NIH Research Support in Autism

$100 million/year in funding





Investigator-initiated grants (R01’s)
Special initiatives, e.g. RFA for genetics
Centers and networks
Training grants (To institutions and individuals)
New initiatives




Intramural Research Program on Autism
Autism Centers of Excellence (ACE)
National Database for Autism Research (NDAR)
ARRA stimulus program
NCBO Webinar
October 7, 2009
BIRN Mediator
NCBO Webinar
October 7, 2009
NDAR System
Clinical Assessments
(OpenClinica)
Neuroimaging
Subject Tracking
& Management
Image Analysis
Common
Measures
Study
Management
Image
Processing
Genomics
BIRN Services
& Resources
Security
Genomics data
access
Portal
Image data
access
Grid
Computing
Collaboration
Data Integration
Query and Reporting
User
Management
Data Integration
Tools
Auditing
NCBO Webinar
October 7, 2009
Data Storage
Management
NDAR Codebook
NCBO Webinar
October 7, 2009
Phenotypes in Psychiatry
‘The observable structural and functional
characteristics of an organism determined by
its genotype and modulated by its
environment’




Diagnostic component
Intermediate phenotype
Quantitative phenotype
Covariates
NCBO Webinar
October 7, 2009
Example Query #1
Find all subject who are verbal (ADIR
A14). Then look at their IQ (Cognitive
Total IQ > 70) and whether or not they
have seizures (Medical History Q10).
Also find out if they have an abnormal
MRI or any genetic abnormalities.
NCBO Webinar
October 7, 2009
Example Query #2
Use head circumference to categorize
macroencephaly. Then see if the
subjects differ in their ADOS, ADI-R,
cognitive, and language profiles, and
combine this with genetic data.
NCBO Webinar
October 7, 2009
NDAR Project



Systematic Review
Ontology Development
Database Infrastructure
NCBO Webinar
October 7, 2009
Systematic Review

“(ADI-R or ADOS or Vineland) and
(genes or genetics) and autism”



26/43 papers relevant
Mean # phenotypes 4.1, range 1-13
Three basic types (1:1, sum, cutoff score)
Tu, S. W. AMIA Annual Proceedings (2008)
NCBO Webinar
October 7, 2009
Systematic Review

Different terms
e.g., ‘age of first phrases’ and ‘age of onset
of phrase speech’

Different cutoff scores
e.g., ‘delayed word’

Different definitions
e.g., ‘regression’
e.g., use of different instruments
NCBO Webinar
October 7, 2009
Ontology

A taxonomy with multiple link types,
each with precise meaning
Clinical Research Study
Case Study
Clinical Trial Study
Controlled Case Study
Study Arms
NCBO Webinar
October 7, 2009
Perspectives on ‘Ontology’

Philosophy: The study
of what entities and
what types of entities
exist in reality

Computer Science: A
schema that represents
a domain and is used to
reason about the
objects in that domain
and the relations
between them
NCBO Webinar
October 7, 2009
Critical to the ‘Semantic Web’

Shared research and development plan to




Provide explicit semantic meaning to data and
knowledge shared on the Web
Bring structure to Web content
Advance the current state-of-the-art in Web
information retrieval, which is keyword searching
Distributed applications will be able to
process data and knowledge automatically
through the use of ontologies
NCBO Webinar
October 7, 2009
OWL: Web Ontology Language


Advances current Semantic Web standards by
using ontologies to represent knowledge
OWL can be used to build ontologies of highlevel descriptions, based on three concepts:



Classes (e.g., Subject, Phenotype, Genotype)
Properties (e.g., isBearerOf, hasResults)
Individuals (e.g., “Macroencephaly”)
NCBO Webinar
October 7, 2009
OWL: Web Ontology Language
hasResult
Subject
011451
Genotype
mutInRELN
isBearerOf
Macroencephaly
Phenotype
NCBO Webinar
October 7, 2009
BIRNLex


A controlled terminology for annotation of
BIRN data sources, focusing on imaging
data from human subjects and mouse
models
Terms cover neuroanatomy, molecular
species, behavioral and cognitive
processes, subject information,
experimental practice and design
NCBO Webinar
October 7, 2009
Basic Formal Ontology


An upper ontology which can be used
to support the development of domain
ontologies used in scientific research
All concepts are subclasses of


Continuants: exists in full at any time in
which it exists at all
Occurants: has temporal parts and that
happens, unfolds or develops through time
NCBO Webinar
October 7, 2009
OBO Foundry

Ontologies should be orthogonal



Minimize overlap
Each distinct entity type (universal) should
only be represented once
Partition efforts in the OBO Foundry
rationally to help organize and
coordinate the ontology development
NCBO Webinar
October 7, 2009
RELATI ON TO
TIM E
GRAN ULARITY
ORGAN AND
ORGAN ISM
CELL AND
CELLULAR
COMPONENT
MOL ECULE
CONTINU ANT
INDEPENDENT
OCCURRENT
DEPENDENT
Organism
(N CBI
Ta xonomy)
Ana tomical
Entity
(FMA,
CARO)
Organ
Function
(FMP,
CPRO)
Cell
(CL)
Cellular
Compo nent
(FMA,GO)
Cellular
Function
(GO)
Molecule
(ChEBI, SO,
Rna O, PrO)
Phenot ypic
Qu ality
(P aT O)
Cellular
Process
(GO)
Molecular Function
(GO)
NCBO Webinar
October 7, 2009
OrganismLevel Process
(GO)
Molecular
Process
(GO)
Chris Mungall, PATO
SWRL: Semantic Web Rule
Language



W3C specification for expressing
logical rules that can be formulated in
terms of OWL concepts
Rules in SWRL can be used to deduce
new knowledge about an existing OWL
ontology
Specification can be extended through
the use of built ins
NCBO Webinar
October 7, 2009
Example SWRL Rule: hasUncle
hasParent(?x, ?y) ^ hasBrother(?y, ?z)
→ hasUncle(?x, ?z)
NCBO Webinar
October 7, 2009
Example SWRL Rule: hasSister
Person(Amar) ^ hasSibling(Amar, ?s)
^ Woman(?s)
→ hasSister(Amar, ?s)
NCBO Webinar
October 7, 2009
Example SWRL Rule: Child
Person(?p) ^ hasAge(?p,?age)
^ swrlb:lessThan(?age,17)
→ Child(?p)
NCBO Webinar
October 7, 2009
NCBO Webinar
October 7, 2009
Rule-Based Methods

Extensions to SWRL

Temporal


Query


Library of temporal built ins
Extraction of results as a table
MakeSet

Support for set-based operations
NCBO Webinar
October 7, 2009
Development Methods



Extensions to BIRNLex
Encoding of phenotypes
Querying of NDAR database
NCBO Webinar
October 7, 2009
Autism Assessment Result
Figure 1. The representation of data collected through the ADI-2003 autism assessment instrument as part of the autism
ontology.
NCBO Webinar
October 7, 2009
Phenotype Representation
Figure 2. The representation of the Status of age of words phentotype group as a OWL class partition by the possible statuses.
NCBO Webinar
October 7, 2009
Phenotype Rule
ADI_2003_result(?assessment) ^
acqorlossoflang_aword(?assessment,?wordage) ^
swrlb:greaterThan(?wordage, 24) ^
subject_id(?assessment, ?subjectId) ^
orgtax:Human(?subject) ^
subject_id(?subject, ?subjectId)
→ birn_obo_ubo:bearer_of(?subject, Delayed_word)
NCBO Webinar
October 7, 2009
Phenotype Rules
NCBO Webinar
October 7, 2009
Ontology-Driven Querying
Young, L. IEEE CBMS (2009)
NCBO Webinar
October 7, 2009
Phenologue Project




Develop an ontology of endophenotypes that maps brain
connectivity, neural deficits, and genetic markers into a
subject domain theory
Develop logic-based methods to encode and classify
endophenotypes based on multi-scale measurements
Create tools to acquire new endophenotypes and annotate
phenotype-genotype findings in online resources such as
published literature
Develop query-elicitation methods that can evaluate
hypotheses about the subject domain theory of
endophenotypes using deductive inference
NCBO Webinar
October 7, 2009
Phenologue Project
Query
Database
Catalog
Phenotype
Definitions
New
Associations
NCBO Webinar
October 7, 2009
Analysis
Rule Technologies




Rule paraphrasing
Rule elicitation
Rulebase visualization
Knowledge mining using rules
NCBO Webinar
October 7, 2009
Rule Paraphrasing
NCBO Webinar
October 7, 2009
Rule Elicitation
NCBO Webinar
October 7, 2009
Rulebase Visualization
NCBO Webinar
October 7, 2009
Computational Phenomics

Informatics methods to support
phenomics


Apply machine learning methods to discover
groups of rules with common semantics
Use natural language processing method to
discover phenotype rules in published text
NCBO Webinar
October 7, 2009
Semantic Similarity
NCBO Webinar
October 7, 2009
Future Directions



Expand phenotype categories
Use natural language processing
method to discover phenotype rules in
published text
Apply machine learning methods to
discover groups of rules with common
semantics
NCBO Webinar
October 7, 2009
Summary


The development of a standardized,
organized, and computable set of
phenotype terms is central to etiologic
studies of complex disorders
The use of ontologies and rules to
model phenotypes is feasible and can
enable automated discovery of new
phenotype-genotype relationships
NCBO Webinar
October 7, 2009
Acknowledgments


Stanford Group
 Martin O’Connor
 Saeed Hassanpour
 Duriel Hardy
 Ravi Shankar
 Lakshika Tennakoon
 Samson Tu
National Center for
Biomedical Ontology
 Mark Musen
 Daniel Rubin

NDAR/NIMH
 Lynn Young
 Matthew McAuliffe
 Dan Hall
 Lisa Gilotty

Biomedical Informatics
Research Network
 Bill Bug
 Maryann Martone
NCBO Webinar
October 7, 2009