Poster - Anil Jegga - Cincinnati Children`s Hospital

Download Report

Transcript Poster - Anil Jegga - Cincinnati Children`s Hospital

An Integrative Approach for the Study of Sequence Variation Impact on Biological Processes, Diseases and Environmental Agents’ Risk
Sivakumar Gowrisankar, Amol S Deshmukh, Anil G Jegga and Bruce J Aronow
Department of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center and University of Cincinnati
A Systems Biology Integrative Approach
Abstract
Genomics Knowledge Platform
Biological Object Model
Genotype
Environment
NIEHS Candidate Genes’ Categorization Based on
GO (Biological Process)
PreBIND
TNF, IL5,
TNFRSF14,
IL12B, IL12A,
IL8, IL1B,
IL4R, LTB,
RAG1,
TNFRSF6,
TNFRSF17,
APOE,
TNFRSF7,
TNFRSF4,
TNFRSF9,
TNFRSF5,
F3, LTA
Biological Entities
PathMaker
Canvas
GPB
Integrated
Annotatio
n
Biological Pathways
Sequence
Databases
Pathway
Databases
PathBuilder
Network Representation
Ontology
Explorer
Cognitive Processing (Researcher/Scientist Reasoning)
Etiologies? Mechanisms? Signatures? Prevention? Treatment?
Biological Explanation
Complex
Builder
Protein-Protein
Interactions
Gene
Expressio
n
Variation (SNPs)
GO
Clusterer
Protein Domains
& 3D Structure
Taxonomy
MedLine
HPRD
Does a SNP in one or more biological
entities result in aberrations within a
pathway and manifest as a disease or
contribute to increased susceptibility
to disease or an altered response to
therapeutic agents?
Disease Processes
Novel Treatments
OMIM
FANCG,
NBS1,
RB1,
TP53,
CDKN2A
Mechanistic Explanation
Normal Cellular Function
GeneRIF
Are these functionally
clustered proteins
involved in a common
biological network or
interaction?
Ontologies
Protein Interactions
The integration of genomic sequence analyses from multiple species and strains,
along with protein interaction data and gene expression profiles that reflect specific
biological states and processes has opened many new avenues to understand
specific biological systems. Nevertheless, formidable challenges remain to be
overcome for the improvement of prediction, diagnosis, prognosis, and treatment
of human diseases. Can we infer from large molecular datasets how different
biological entities are organized and interact, and then predict the effect that
genetic polymorphisms or sequence variations might confer on interconnected
biological processes? The integration of heterogeneous data and information in fact
is a key issue in functional genomics. An appropriate data model and consistent
methods for its integrated representation, analysis, and visualization has the
potential to pave the way for the emergence of discovery-driven science, enhance
hypothesis-generation, and provide new focus for experimental validation and
refinement. Thus, to represent the presence and impact of polymorphisms further
in the context of biological pathways, we have sought to unify our representation
of molecular, biological, and environmental entities such that biological knowledge
from experts and biomedical literature could be assembled in a storyboard canvas.
For example, the representation of a disease could consist of a biological process
composed of one or more pathways, within which, entities (gene products,
complexes, and cellular and subcellular components) are subjected to one or more
interactions and transitions to disease term associated states. We have begun the
development of a suite of applications using a common database structure that can
represent biological processes using a host of publicly available data sources
including gene objects and biological ontologies that in turn represent systematic
abstractions of biomedical literature and expert knowledge. As part of this exercise,
we have compiled all existing protein-protein interactions from “interactome” rich
databases (PreBIND, MINT, DIP and HPRD) and mine the biomedical literature
for novel interactions unrepresented in these specialized databases. Our compiled
interactions data comply with the standards set out by Proteomics Standards
Initiative (PSI) facilitating easy data exchange. As available annotations increase
the challenge is to integrate biological process representation in such a way as to
increase our understanding rather than obscure in convoluted figures or excessive
detail. The use of a network visualizer provides not only a lucid means of
summarizing existing biological knowledge about molecular behavior but also
helps in elucidating the potential implications sequence variations can have on
protein-protein interactions or the binding of specific transcription factors.
XPrInt: Extracting & Compiling Protein Interactions
Co-citation in
literature
abstracts using
gene/protein
symbols and
“interactomespecific”
keywords
GKP-PathMaker
Gene Summary
Other Databases
Biomedical Discovery Process
Genome
PatholoGene
GKP Object
PatholoGene – Development of a system to link biological entities, anatomy,
pathways and diseases using the UMLS Semantic Network, NCBI-OMIM and
MedLine abstract parsing with ICD10 disease terms and gene symbols. The
Semantic Network, through its semantic types, provides a categorization of all
UMLS Metathesaurus concepts. The links between the semantic types provide the
structure for the Network and represent important relationships in the biomedical
domain. The UMLS Metathesaurus contains information about biomedical
concepts and terms from many controlled vocabularies and classifications used in
patient records, bibliographic and full-text databases, and expert systems. As a test
case we illustrate the analysis of colon cancer as a function of anatomy, pathology,
etiology and disease progression.
Hollow
viscus
Large Intestine
Anatomy
Ontology
Organ with
organ cavity
Intestines
Colon structure
Region of large
intestine
Colon
Biological Entities
Large
intestinal
structure
Inborn Genetic
Diseases
Disease
Ontology
Regulome
Neoplasms
Expert Curated
Proteome
Publishable
Hereditary Neoplastic
Syndromes
Interactome
Gene
Molecule
Map
BioMaterial
Pharmacogenome
Ontologies
Metabolome
Hereditary NonPolyposis
Colon Cancer
Physiome
HNPCC (hMSH2, hMLH1,
hPMS1, hPMS2)
Pathome
12 Siblings (UMLS – Concepts)
→Adenomatous Polyposis Coli
→Basal Cell Nevus Syndrome
→Colorectal Neoplasms, Hereditary Nonpolyposis
→Dysplastic Nevus Syndrome
→Exostoses, Multiple Hereditary
Pathologene Report:
Extracting
→Hamartoma Syndrome, Multiple
relationships between
→Li-Fraumeni Syndrome
disease, anatomy and
genes.
→Multiple Endocrine Neoplasia
→Nephroblastoma
→Neurofibromatoses
→Peutz-Jeghers Syndrome
→Sturge-Weber Syndrome
Variome
Transcriptome
Future Directions
Unified Representation of Disease States and Biological Processes using
Clinical Phenotype, Molecular Signatures, and Genetic Attributes
Disease State A
New
Insights &
Hypotheses
Disease
Process
Modeling
Tool
Sample-Centered Genetic
and Genomic Data
Disease State B
Therapeutic
Intervention
Analysis, Diagnosis and Prediction
References & Support
1.
XPrInt and PatholoGene: http://abstrainer.cchmc.org
2.
UMLS Knowledge Source Server: http://umlsks.nlm.nih.gov
3.
Open Biological Ontologies: http://obo.sourceforge.net
Support: NIEHS U01 ES11038 Mouse Centers Genomics Consortium